DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

Nontawat Tritrong; Puntawat Ponglertnapakorn; Supasorn Suwajanakorn

arxiv: 2304.09479 · v5 · submitted 2023-04-19 · 💻 cs.CV · cs.GR· cs.LG

DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

Puntawat Ponglertnapakorn , Nontawat Tritrong , Supasorn Suwajanakorn This is my paper

Pith reviewed 2026-05-24 08:40 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.LG

keywords face relightingdiffusion modelscast shadowssingle-viewin-the-wildDDIM conditioningshadow map

0 comments

The pith

A conditional diffusion model relights single-view faces with consistent cast shadows by modulating DDIM steps with rendered shading and an inferred shadow map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a face relighting method that avoids explicit decomposition into shape, albedo, and lighting. Instead it feeds off-the-shelf 3D and identity encodings plus a light code into a DDIM whose denoising is spatially guided by a rendered shading image and a simple shadow map. The approach trains on ordinary 2D photographs alone, without light-stage captures or paired relit data. If successful it produces temporally consistent cast shadows across lighting changes and reaches state-of-the-art scores on the Multi-PIE benchmark plus top user-study rankings.

Core claim

By conditioning a DDIM decoder on a disentangled light encoding together with a rendered shading reference and an inferred shadow map, the model can synthesize relit face images that preserve identity and geometry while adding realistic, temporally consistent cast shadows, all from a single network pass and without any ground-truth lighting supervision.

What carries the argument

Conditional DDIM whose spatial modulation is performed by a rendered shading reference combined with a shadow map inferred from the input geometry.

If this is right

Relighting no longer requires light-stage data, relit pairs, or multi-view images.
A single forward pass produces the relit image once pre-processing is done.
Cast shadows remain consistent when the same face is shown under changing target lights.
Performance exceeds the teacher model on all reported metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning trick could be tested on non-face objects once reliable shape estimators exist.
If the shadow-map step generalizes, it may reduce the need for full global-illumination simulation in other diffusion relighting tasks.
Single-pass operation opens the possibility of applying the model to short video clips without per-frame retraining.

Load-bearing premise

Off-the-shelf 3D shape and facial-identity estimators supply inputs accurate enough that the simple shadow-map modulation does not introduce large visible errors in the final output.

What would settle it

Run the method on Multi-PIE test sequences using the same off-the-shelf estimators; if the generated cast shadows fail to match ground-truth shadow boundaries or show temporal flicker, the claim is falsified.

Figures

Figures reproduced from arXiv: 2304.09479 by Nontawat Tritrong, Puntawat Ponglertnapakorn, Supasorn Suwajanakorn.

**Figure 1.** Figure 1: Our method addresses one of the most challenging relighting scenarios where input images contain strong highlights and [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of DiFaReli++. We use off-the-shelf estimators to derive various encodings from the input image: segmentation masks, shadow map, (light, shape, camera) parameters, and face embedding. These encodings are then fed into a conditional DDIM via “spatial” and “non-spatial” conditioning techniques. For spatial conditioning, a shading reference, shadow map, and segmentation masks are concatenated and fed… view at source ↗

**Figure 3.** Figure 3: Computing the shadow map for training. We used a pretrained DiFaReli model to generate stronger and reduced versions of the input image, then identify shadow areas through pixel differences. Our process produces more accurate and spatially aligned shadow maps compared to ray-traced maps shown in red, which suffer from inaccurate lighting and geometry estimation. 1) Computing a shadow map: We use our pretra… view at source ↗

**Figure 4.** Figure 4: Modifications of the Modulator’s input in DiFaReli++. The input is a concatenation of the shadow map, the shading reference, and segmentation masks (see all masks in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Single-shot face relighting framework involves a) using DiFaReli++ to generate supervised relit pairs and b) training a single-shot relighting network with the same architecture as DiFaReli++ using the training pairs with a simple L2 loss. Input - Reference Pandey et al. (SIGGRAPH’21) Hou et al. (CVPR’21) Hou et al. (CVPR’22) IC-Light (Github’24) DiFaReli (ICCV’23) DiFaReli++ss Ours [PITH_FULL_IMAGE:figur… view at source ↗

**Figure 6.** Figure 6: Relit results on FFHQ [29]. The FFHQ dataset contains diverse face images captured in real-world environments. Our method produces more realistic relit images, as well as cast shadows, which can be controlled via the shadow map in the rightmost column. It effectively removes existing cast shadows and adds new ones. Additionally, it can relight non-facial parts (e.g., hats, hoodies, or shirts) to match the … view at source ↗

**Figure 7.** Figure 7: Relighting results on Multi-PIE [20] when the target lighting is taken from the same person (first row) and from a different person (second row). - Shadow Input + Shadow [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Varying intensities of cast shadow. DiFaReli’s ability to change the intensity of cast shadows by adjusting the scalar c and decode the modified feature vector (more in [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Relighting with consistent cast shadows. Compared to four recent state-of-the-art methods [95], [27], [26], [50], DiFaReli++ effectively removes input cast shadows and synthesizes new ones in a realistic and consistent manner. The bottom row shows our shading references and shadow maps. Additional results are in Appendix ( [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Results using various acceleration techniques. with different sampling steps on an input image from FFHQ. While these techniques can reduce the sampling steps, they introduce artifacts and blurriness. In contrast, our distilled version of DiFaReli++ (DiFaReli++ss) delivers the highest quality, the least noisy output, and runs in just 0.07 seconds. image to match the target lighting, considering: (1) only … view at source ↗

**Figure 11.** Figure 11: Trade-off between runtime and relighting performance of different acceleration techniques measured on three metrics: DSSIM, MSE, and LPIPS. The first row shows results on the test set where the target lighting is taken from the same subject, while the second row uses target lighting from a different subject. The red dashed line represents our single-shot face relighting score (DiFaReli++ss), and the magen… view at source ↗

**Figure 12.** Figure 12: Background conditioning ablation. Without background conditioning, non-facial regions like hats may disappear. Conditioning on raw pixels in DiFaReli preserves the hat, while conditioning on segmentation masks in DiFaReli++ss not only preserves it but also enables its relighting. E. Ablation studies Light conditioning. We compare our full pipeline with two alternatives for conditioning the DDIM on the l… view at source ↗

**Figure 13.** Figure 13: Improvements over DiFaReli’s failure cases. DiFaReli++ss better remove shadows cast by external objects (top) and better preserves sunglasses (bottom). DiFaReli++ss Input a) Fails to handle cast shadows on hat/shirt b) Does not create shadows cast by external objects c) Mistakenly relights object covering the face [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Failure cases. Our method a) may fail to add or remove cast shadows on non-facial parts (e.g., hats, clothing), b) may not produce shadows cast by external objects, or c) may mistakenly relight objects occluding the face (e.g., hands), leading to unrealistic relighting in some cases. ArcFace (ξ) and DECA (s, cam) by evaluating the relight performance on: c) Our method with no s, cam, ξ. d) Our method with… view at source ↗

**Figure 15.** Figure 15: Diagram of one of the residual blocks inside the first [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

**Figure 16.** Figure 16: Diagram of one of the 3-layer MLPs in the non-spatial [PITH_FULL_IMAGE:figures/full_fig_p017_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of DiFaReli and DiFaReli++ pipelines. Differences are highlighted with red borders. Key changes are: 1) Background conditioning: replacing the background image with a concatenation of segmentation masks to enable relighting of non-facial parts. 2) Shadow estimator: using a shadow map with an encoded shadow scalar for improved consistency in cast shadows generation. 3) The cast shadow scalar c i… view at source ↗

**Figure 18.** Figure 18: Comparison against HoloRelighting [42] and visual analysis of our limitations. a) Our method better preserves fine details, such as hair and teeth, compared to HoloRelighting results, taken directly from their paper due to the lack of source code. Note that our target lighting was estimated using DECA [15] from the target image. b) The overall lighting in our result lacks the strong orange shading present… view at source ↗

**Figure 19.** Figure 19: Comparison against SwitchLight [31] and visual analysis of our limitations. SwitchLight’s results were taken directly from their paper due to the lack of source code. Our method addresses SwitchLight’s limitations: a) our method effectively removes hard cast shadows and better preserves makeup details, and b) produces sharper details. c) Our results appear less consistent with the target lighting, lacking… view at source ↗

**Figure 20.** Figure 20: Ablation study of the light conditioning (Section 4.3A in the main text). Ground truth Used as non-spatial Ours No Modulator Ours (DiFaReli) Input [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗

**Figure 21.** Figure 21: Ablation study of the non-spatial conditioning variable (Section 4.3B in the main text) [PITH_FULL_IMAGE:figures/full_fig_p021_21.png] view at source ↗

**Figure 22.** Figure 22: Relit results under rotating light around the forward axis (roll) on the FFHQ test set [29]. The order of results for each task is shuffled when displayed to each participant. Instructions and criteria for making selections are provided at the top of the page [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗

**Figure 23.** Figure 23: Relit results under rotating light around the forward axis (roll) on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p023_23.png] view at source ↗

**Figure 24.** Figure 24: Relit results under rotating light around the forward axis (roll) on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p024_24.png] view at source ↗

**Figure 25.** Figure 25: Relit results under rotating light around the up axis (yaw) on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p025_25.png] view at source ↗

**Figure 26.** Figure 26: Relit results under rotating light around the up axis (yaw) on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p026_26.png] view at source ↗

**Figure 27.** Figure 27: Relit results on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p027_27.png] view at source ↗

**Figure 28.** Figure 28: Relit results on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p028_28.png] view at source ↗

**Figure 29.** Figure 29: Relit results on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p029_29.png] view at source ↗

**Figure 30.** Figure 30: Relit results on the FFHQ test set [29] [PITH_FULL_IMAGE:figures/full_fig_p030_30.png] view at source ↗

**Figure 31.** Figure 31: Improved DDIM sampling with mean-matching. We show a qualitative comparison between“with” and “without” mean-matching. Our mean-matching technique helps correct the overall brightness in both the inversion output and relit image [PITH_FULL_IMAGE:figures/full_fig_p031_31.png] view at source ↗

**Figure 32.** Figure 32: Varying the intensities of cast shadows on FFHQ [29] [PITH_FULL_IMAGE:figures/full_fig_p032_32.png] view at source ↗

**Figure 33.** Figure 33: Poor results from using ray-traced shadow maps for inversion. Using ray-traced shadow maps for DDIM inversion, the top result shows that non-shadow areas are over-brightened (highlighted with a red circle), while the bottom result shows a failure to remove shadows and closely follow the conditioning shadow map. Hair Face skin Eyes Eyeballs Glasses Ears Nose Inside mouth Upper lip Lower lip Neck Cloth Hat … view at source ↗

**Figure 34.** Figure 34: All segmentation masks used as conditioning inputs in DiFaReli++ (Section IV-B in the main text) [PITH_FULL_IMAGE:figures/full_fig_p033_34.png] view at source ↗

**Figure 35.** Figure 35: Examples of proxy background images that serve as target lighting for IC-light [ [PITH_FULL_IMAGE:figures/full_fig_p034_35.png] view at source ↗

**Figure 36.** Figure 36: User interface for the relighting user study of facial and non-facial parts (Section V-C1 in the main text) [PITH_FULL_IMAGE:figures/full_fig_p035_36.png] view at source ↗

**Figure 37.** Figure 37: User interface for the relighting user study on relighting quality of controllable cast shadows (Section V-C2 in the main text). In the interface, these results are videos that play simultaneously [PITH_FULL_IMAGE:figures/full_fig_p036_37.png] view at source ↗

read the original abstract

We introduce a novel approach to single-view face relighting in the wild, addressing challenges such as global illumination and cast shadows. A common scheme in recent methods involves intrinsically decomposing an input image into 3D shape, albedo, and lighting, then recomposing it with the target lighting. However, estimating these components is error-prone and requires many training examples with ground-truth lighting to generalize well. Our work bypasses the need for accurate intrinsic estimation and can be trained solely on 2D images without any light stage data, relit pairs, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM. Moreover, we propose a single-shot relighting framework that requires just one network pass, given pre-processed data, and even outperforms the teacher model across all metrics. Our method realistically relights in-the-wild images with temporally consistent cast shadows under varying lighting conditions. We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies. Please visit our page: https://diffusion-face-relighting-pp.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is conditioning a DDIM on a rendered shading reference plus a simple inferred shadow map to get cast-shadow consistency in single-image face relighting without light-stage data or intrinsic decomposition.

read the letter

The core idea here is to skip explicit intrinsic decomposition for face relighting and instead feed a DDIM a light encoding plus shape and identity cues from off-the-shelf estimators, then spatially modulate it with a rendered shading reference and an inferred shadow map. This lets the model train on ordinary 2D photos only and still produce outputs with temporally consistent shadows under new lighting. They also describe a single-pass version that beats the teacher model on their metrics. That conditioning trick is the actual novelty and it directly targets the geometry-light interaction problem that usually requires paired or multi-view data. It is a practical step for applications like photo editing where collecting light-stage captures is not feasible. The approach builds on existing diffusion relighting work but changes the input modulation to avoid the usual error-prone decomposition step. The results are claimed to be SOTA on Multi-PIE with top user-study scores and good shadow behavior on wild images. The soft spot is the reliance on those off-the-shelf 3D estimators and the shadow-map inference. If the shape estimate is off by even a few degrees or millimeters, the shadow map will misalign and the diffusion step has no built-in correction. The paper calls the shadow technique simple and effective but does not show ablations with ground-truth geometry or failure cases on difficult poses, so the robustness claim is not fully secured. The abstract states the performance numbers without giving the actual values or comparison tables, which makes the SOTA assertion hard to judge from the summary alone. This work is aimed at people doing practical single-image relighting or avatar pipelines who already use diffusion models. A reader focused on conditioning tricks for diffusion in graphics would find the method worth examining if the full evaluation holds. It is coherent enough on its own terms to deserve a serious referee even though the evaluation details need tightening.

Referee Report

3 major / 2 minor

Summary. The paper introduces DiFaReli++, a single-view face relighting method for in-the-wild images that uses a conditional DDIM decoder on disentangled light, 3D shape, and identity encodings obtained from off-the-shelf estimators. It proposes a conditioning technique that spatially modulates the DDIM via a rendered shading reference plus an inferred shadow map (obtained by a 'simple and effective technique') to model light-geometry interactions, including cast shadows. The method is trained only on 2D images without light-stage data, relit pairs, or lighting ground truth; it claims a single-shot inference pass, SOTA quantitative results on Multi-PIE, highest user-study rankings, and temporally consistent cast shadows under varying lighting.

Significance. If the conditioning technique and error propagation from off-the-shelf estimators can be shown to be robust, the approach would be significant for enabling realistic relighting with consistent shadows without requiring accurate intrinsic decomposition or specialized training data. The single-pass inference and avoidance of light-stage supervision are practical strengths.

major comments (3)

[Abstract, §3] Abstract and §3 (Method): the central claim that the rendered shading + inferred shadow map 'simplifies modeling the complex interaction between light and geometry' and produces 'temporally consistent cast shadows' rests on unverified accuracy of the shadow map when derived from off-the-shelf 3D estimators; no quantitative propagation analysis, no ablation with ground-truth geometry, and no failure-case study on pose/expression variation (where shape errors are known to be large) are provided, leaving the link between 2D-only training and output consistency unsecured.
[§4] §4 (Experiments): the assertion of 'state-of-the-art performance on the standard benchmark Multi-PIE' and 'outperforms the teacher model across all metrics' is stated without any reported quantitative tables, error bars, or per-metric comparisons in the abstract and is not accompanied by ablation details on the shadow-map component, which is load-bearing for the consistency claim.
[§3.2] §3.2 (Conditioning technique): the shadow-map inference is described as 'simple and effective' yet no explicit formulation, pseudocode, or sensitivity analysis to input shape error (e.g., angular or depth deviation) is given, so it is impossible to assess whether the DDIM can correct misaligned shadows or merely propagates them.

minor comments (2)

[Abstract] The abstract states results on Multi-PIE and user studies but does not cite the exact table or figure numbers where these appear; adding explicit cross-references would improve readability.
[§3] Notation for the 'disentangled light encoding' and 'other encodings related to 3D shape and facial identity' should be defined with symbols or a diagram in §3 to avoid ambiguity when describing the DDIM conditioning.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, proposing revisions where appropriate to strengthen the paper while remaining faithful to the presented work.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (Method): the central claim that the rendered shading + inferred shadow map 'simplifies modeling the complex interaction between light and geometry' and produces 'temporally consistent cast shadows' rests on unverified accuracy of the shadow map when derived from off-the-shelf 3D estimators; no quantitative propagation analysis, no ablation with ground-truth geometry, and no failure-case study on pose/expression variation (where shape errors are known to be large) are provided, leaving the link between 2D-only training and output consistency unsecured.

Authors: We agree that additional analysis would strengthen the claims regarding robustness. The current manuscript does not include quantitative error propagation analysis or ablations with ground-truth geometry, as our approach is explicitly designed for 2D-only training without such supervision. In revision, we will add a failure-case study examining performance under pose and expression variations to better illustrate behavior with shape estimation errors. We maintain that the Multi-PIE results and user study provide supporting evidence for consistency, but acknowledge the value of the suggested additions. revision: partial
Referee: [§4] §4 (Experiments): the assertion of 'state-of-the-art performance on the standard benchmark Multi-PIE' and 'outperforms the teacher model across all metrics' is stated without any reported quantitative tables, error bars, or per-metric comparisons in the abstract and is not accompanied by ablation details on the shadow-map component, which is load-bearing for the consistency claim.

Authors: Quantitative tables with per-metric comparisons on Multi-PIE, including outperformance over the teacher model, are reported in Section 4 of the manuscript. Abstracts are summaries and do not typically contain full tables or error bars. To address the concern, we will add error bars to the existing tables and include a dedicated ablation study on the shadow-map component in the revised experiments section. revision: yes
Referee: [§3.2] §3.2 (Conditioning technique): the shadow-map inference is described as 'simple and effective' yet no explicit formulation, pseudocode, or sensitivity analysis to input shape error (e.g., angular or depth deviation) is given, so it is impossible to assess whether the DDIM can correct misaligned shadows or merely propagates them.

Authors: We will revise Section 3.2 to include the explicit formulation of the shadow-map inference technique, accompanying pseudocode, and a sensitivity analysis to input shape errors (such as angular or depth deviations) to the extent possible with available data. This will allow readers to better evaluate the conditioning mechanism. revision: yes

standing simulated objections not resolved

Quantitative ablation studies using ground-truth geometry for error propagation analysis, as the method is trained solely on 2D images and does not have access to such ground-truth data.

Circularity Check

0 steps flagged

No circularity: method uses external estimators and standard DDIM conditioning without self-referential reductions

full rationale

The paper's approach relies on off-the-shelf 3D shape and identity estimators plus a simple shadow map inference to condition a standard DDIM, with training solely on 2D images. No equations, predictions, or derivations in the abstract or described framework reduce outputs to quantities defined by the method's own fitted parameters or self-citations. The conditioning technique is presented as a practical modulation step rather than a tautological construction, and claims rest on benchmark performance and user studies rather than internal self-definition. This is a standard empirical method paper with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to premises stated or implied there; no free parameters, axioms, or invented entities are explicitly quantified.

axioms (2)

domain assumption Off-the-shelf 3D shape and identity estimators supply inputs accurate enough for the downstream diffusion conditioning to succeed.
Abstract states that shape and identity encodings are inferred from off-the-shelf estimators and used directly in the DDIM conditioning.
ad hoc to paper A simple shadow-map inference technique combined with rendered shading can spatially modulate the DDIM to model light-geometry interactions.
Abstract presents this as the novel conditioning technique that simplifies the complex interaction.

pith-pipeline@v0.9.0 · 5814 in / 1411 out tokens · 21382 ms · 2026-05-24T08:40:22.561784+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

103 extracted references · 103 canonical work pages · 10 internal anchors

[1]

Segdiff: Image segmentation with diffusion probabilistic models

Tomer Amit, Eliya Nachmani, Tal Shaharbany, and Lior Wolf. Segdiff: Image segmentation with diffusion probabilistic models. arXiv:2112.00390, 2021. 17

work page arXiv 2021
[2]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv:2211.01324, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models

Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv:2201.06503, 2022. 4

work page arXiv 2022
[4]

Label-efficient semantic segmentation with diffu- sion models

Dmitry Baranchuk, Ivan Rubachev, Andrey V oynov, Valentin Khrulkov, and Artem Babenko. Label-efficient semantic segmentation with diffu- sion models. arXiv:2112.03126, 2021. 17

work page arXiv 2021
[5]

Shape, illumination, and reflectance from shading

Jonathan T Barron and Jitendra Malik. Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence, 37(8):1670–1687, 2014. 1, 3

work page 2014
[6]

A morphable model for the synthesis of 3d faces

V olker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques , pages 187–194, 1999. 3, 17

work page 1999
[7]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Trem- blay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks, 2022. 4

work page 2022
[8]

Denoising likelihood score matching for conditional score-based data generation

Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia-Ping Chen, and Chun-Yi Lee. Denoising likelihood score matching for conditional score-based data generation. arXiv:2203.14206, 2022. 17

work page arXiv 2022
[9]

Diffusion models in vision: A survey

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. arXiv:2209.04747, 2022. 17 14

work page arXiv 2022
[10]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 4690–4699, 2019. 2, 5, 17

work page 2019
[11]

Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 0–0, 2019. 17

work page 2019
[12]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems , 34:8780–8794, 2021. 6, 16, 17, 19

work page 2021
[13]

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks , 107:3–11, 2018. 6

work page 2018
[14]

Near perfect gan inversion

Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, and Aleix Martinez. Near perfect gan inversion. arXiv:2202.11833, 2022. 4

work page arXiv 2022
[15]

Black, and Timo Bolkart

Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. vol- ume 40, 2021. 2, 4, 5, 8, 18, 19

work page 2021
[16]

Learning an animatable detailed 3d face model from in-the-wild images

Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG) , 40(4):1–13, 2021. 17

work page 2021
[17]

Con- trollable light diffusion for portraits, 2023

David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel S ´ykora, and Rohit Pandey. Con- trollable light diffusion for portraits, 2023. 3

work page 2023
[18]

Unsupervised training for 3d morphable model regression

Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 8377–8386, 2018. 17

work page 2018
[19]

Gen- erative adversarial networks

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial networks. Communications of the ACM , 63(11):139– 144, 2020. 4

work page 2020
[20]

Multi-pie

Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie. Image and vision computing , 2010. 3, 9, 10, 16

work page 2010
[21]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016. 5

work page 2016
[22]

Diffrelight: Diffusion-based facial performance relighting

Mingming He, Pascal Clausen, Ahmet Levent Tas ¸el, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, et al. Diffrelight: Diffusion-based facial performance relighting. In SIGGRAPH Asia 2024 Conference Papers , pages 1–12, 2024. 4

work page 2024
[23]

Denoising diffusion prob- abilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models. Advances in Neural Information Processing Systems , 33:6840–6851, 2020. 5, 6, 17

work page 2020
[24]

Cascaded diffusion models for high fidelity image generation

Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Moham- mad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. , 23:47–1, 2022. 17

work page 2022
[25]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv:2207.12598, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Face relighting with geometrically consistent shadows

Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Face relighting with geometrically consistent shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4217–4226, 2022. 1, 3, 4, 7, 8, 9, 10, 11, 12, 16, 18

work page 2022
[27]

Towards high fidelity face relighting with realistic shadows

Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Towards high fidelity face relighting with realistic shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 14719–14728, 2021. 1, 3, 4, 8, 9, 10, 11, 12, 18

work page 2021
[28]

3d face reconstruction with geometry details from a single image

Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, and Ligang Liu. 3d face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing , 27(10):4756–4770, 2018. 17

work page 2018
[29]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019. 4, 9, 12, 16, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32

work page 2019
[30]

Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and Rynson W.H. Lau. Harmonizer: Learning to perform white-box image and video harmo- nization. In European Conference on Computer Vision , 2022. 4

work page 2022
[31]

Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting, 2024

Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting, 2024. 1, 3, 4, 9, 17, 18, 20

work page 2024
[32]

Illumination-invariant face recog- nition with deep relit face images

Ha A Le and Ioannis A Kakadiaris. Illumination-invariant face recog- nition with deep relit face images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) . IEEE, 2019. 1, 3

work page 2019
[33]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) , 2017. 5, 6, 17

work page 2017
[34]

A closed-form solution to photorealistic image stylization

Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and Jan Kautz. A closed-form solution to photorealistic image stylization. In Proceedings of the European Conference on Computer Vision (ECCV) , 2018. 4

work page 2018
[35]

Feature- preserving detailed 3d face reconstruction from a single image

Yue Li, Liqian Ma, Haoqiang Fan, and Kenny Mitchell. Feature- preserving detailed 3d face reconstruction from a single image. In Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production , pages 1–9, 2018. 17

work page 2018
[36]

Targeting Ultimate Accuracy: Face Recognition via Deep Embedding

Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv:1506.07310, 2015. 17

work page internal anchor Pith review Pith/arXiv arXiv 2015
[37]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 4, 10

work page 2022
[38]

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv:2211.01095, 2022. 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Deep photo style transfer

Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4990–4998, 2017. 4

work page 2017
[40]

Photoapp: Photorealistic appearance editing of head portraits

BR Mallikarjun, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed A Elgharib, et al. Photoapp: Photorealistic appearance editing of head portraits. ACM Transactions on Graphics ,

work page
[41]

Face-specific data augmentation for unconstrained face recog- nition

Iacopo Masi, Anh Tu ˆan Tr ˆa`n, Tal Hassner, Gozde Sahin, and G ´erard Medioni. Face-specific data augmentation for unconstrained face recog- nition. International Journal of Computer Vision , 127, 2019. 17

work page 2019
[42]

Holo- relighting: Controllable volumetric portrait relighting from a single image

Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, and Vishal M Patel. Holo- relighting: Controllable volumetric portrait relighting from a single image. arXiv:2403.09632, 2024. 1, 3, 4, 9, 17, 18, 19

work page arXiv 2024
[43]

Sdedit: Guided image synthesis and editing with stochastic differential equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations , 2021. 17

work page 2021
[44]

Learning physics-guided face relighting under directional light

Thomas Nestmeyer, Jean-Franc ¸ois Lalonde, Iain Matthews, and Andreas Lehrmann. Learning physics-guided face relighting under directional light. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5124–5133, 2020. 1, 3, 4, 8, 9, 10, 18

work page 2020
[45]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741, 2021. 6, 17

work page internal anchor Pith review Pith/arXiv arXiv 2021
[46]

Vaes meet diffusion models: Efficient and high-fidelity generation

Kushagra Pandey, Avideep Mukherjee, Piyush Rai, and Abhishek Ku- mar. Vaes meet diffusion models: Efficient and high-fidelity generation. In NeurIPS 2021 Workshop on Deep Generative Models and Down- stream Applications, 2021. 17

work page 2021
[47]

Total relighting: learning to relight portraits for background replacement

Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Christian Haene, Sofien Bouaziz, Christoph Rhemann, Paul Debevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG) , 40(4):1–21, 2021. 1, 3, 4, 9, 10, 11, 17, 18

work page 2021
[48]

Relightify: Relightable 3d faces from a single image via diffusion models

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou, and Stefanos Zafeiriou. Relightify: Relightable 3d faces from a single image via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023. 3

work page 2023
[49]

Deep face recognition

Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015. 17

work page 2015
[50]

DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supasorn Suwa- janakorn. Difareli: Diffusion face relighting. arXiv:2304.09479, 2023. 2, 3, 7, 8, 9, 11, 12, 18

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dream- fusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

Diffusion autoencoders: Toward a meaningful and decodable representation

Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, and Su- pasorn Suwajanakorn. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022. 2, 5, 6, 7, 17, 19

work page 2022
[53]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 17

work page 2021
[54]

Explor- ing the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Explor- ing the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research , 21(1):5485–5551, 2020. 17 15

work page 2020
[55]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

Facelit: Neural 3d relightable faces, 2023

Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, and Oncel Tuzel. Facelit: Neural 3d relightable faces, 2023. 4

work page 2023
[57]

Relightful harmonization: Lighting-aware portrait background replacement, 2023

Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful harmonization: Lighting-aware portrait background replacement, 2023. 3, 4, 9

work page 2023
[58]

Encoding in style: a stylegan encoder for image-to-image translation

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021. 4

work page 2021
[59]

Pivotal tuning for latent-based editing of real images

Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics (TOG) , 42(1):1–13, 2022. 4

work page 2022
[60]

High-resolution image synthesis with latent diffusion models, 2021

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models, 2021. 6

work page 2021
[61]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10684–10695, 2022. 17

work page 2022
[62]

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv:2208.12242, 2022. 17

work page arXiv 2022
[63]

Palette: Image-to-image diffusion models

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022. 17

work page 2022
[64]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487,

work page internal anchor Pith review Pith/arXiv arXiv
[65]

Relightable gaussian codec avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2024. 4

work page 2024
[66]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. arXiv:2311.17042, 2023. 3, 4

work page arXiv 2023
[67]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 815–823, 2015. 17

work page 2015
[68]

Sfsnet: Learning shape, reflectance and illuminance of facesin the wild

Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition , 2018. 1, 3, 10, 17, 18

work page 2018
[69]

Style transfer for headshot portraits

YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Fr´edo Durand. Style transfer for headshot portraits. 2014. 4

work page 2014
[70]

Portrait lighting transfer using a mass transport approach

Zhixin Shu, Sunil Hadap, Eli Shechtman, Kalyan Sunkavalli, Sylvain Paris, and Dimitris Samaras. Portrait lighting transfer using a mass transport approach. ACM Transactions on Graphics (TOG) , 2017. 4, 17

work page 2017
[71]

Neural face editing with intrinsic image disentangling

Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shecht- man, and Dimitris Samaras. Neural face editing with intrinsic image disentangling. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 5541–5550, 2017. 1, 3

work page 2017
[72]

D2c: Diffusion-decoding models for few-shot conditional generation

Abhishek Sinha, Jiaming Song, Chenlin Meng, and Stefano Ermon. D2c: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems , 34, 2021. 17

work page 2021
[73]

Deep unsupervised learning using nonequilibrium thermody- namics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermody- namics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning , Proceedings of Machine Learning Research, pages 2256–2265. PMLR, 2015. 5, 17

work page 2015
[74]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representa- tions, 2021. 2, 5, 6

work page 2021
[75]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consis- tency models. arXiv:2303.01469, 2023. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2023
[76]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 32. Curran Asso- ciates, Inc., 2019. 5, 17

work page 2019
[77]

Single image portrait relighting

Tiancheng Sun, Jonathan T Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul E Debevec, and Ravi Ramamoorthi. Single image portrait relighting. ACM Trans. Graph., 38(4):79–1, 2019. 3, 10, 18

work page 2019
[78]

Deepface: Closing the gap to human-level performance in face veri- fication

Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face veri- fication. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014. 17

work page 2014
[79]

Pie: Portrait image embedding for semantic control

Ayush Tewari, Mohamed Elgharib, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollh ¨ofer, and Christian Theobalt. Pie: Portrait image embedding for semantic control. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020. 4

work page 2020
[80]

Stylerig: Rigging stylegan for 3d control over portrait images

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollhofer, and Christian Theobalt. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6142–6151, 2020. 4

work page 2020

Showing first 80 references.

[1] [1]

Segdiff: Image segmentation with diffusion probabilistic models

Tomer Amit, Eliya Nachmani, Tal Shaharbany, and Lior Wolf. Segdiff: Image segmentation with diffusion probabilistic models. arXiv:2112.00390, 2021. 17

work page arXiv 2021

[2] [2]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, et al. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv:2211.01324, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models

Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-dpm: an an- alytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv:2201.06503, 2022. 4

work page arXiv 2022

[4] [4]

Label-efficient semantic segmentation with diffu- sion models

Dmitry Baranchuk, Ivan Rubachev, Andrey V oynov, Valentin Khrulkov, and Artem Babenko. Label-efficient semantic segmentation with diffu- sion models. arXiv:2112.03126, 2021. 17

work page arXiv 2021

[5] [5]

Shape, illumination, and reflectance from shading

Jonathan T Barron and Jitendra Malik. Shape, illumination, and reflectance from shading. IEEE transactions on pattern analysis and machine intelligence, 37(8):1670–1687, 2014. 1, 3

work page 2014

[6] [6]

A morphable model for the synthesis of 3d faces

V olker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques , pages 187–194, 1999. 3, 17

work page 1999

[7] [7]

Chan, Connor Z

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Trem- blay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks, 2022. 4

work page 2022

[8] [8]

Denoising likelihood score matching for conditional score-based data generation

Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia-Ping Chen, and Chun-Yi Lee. Denoising likelihood score matching for conditional score-based data generation. arXiv:2203.14206, 2022. 17

work page arXiv 2022

[9] [9]

Diffusion models in vision: A survey

Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey. arXiv:2209.04747, 2022. 17 14

work page arXiv 2022

[10] [10]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 4690–4699, 2019. 2, 5, 17

work page 2019

[11] [11]

Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages 0–0, 2019. 17

work page 2019

[12] [12]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems , 34:8780–8794, 2021. 6, 16, 17, 19

work page 2021

[13] [13]

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Stefan Elfwing, Eiji Uchibe, and Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks , 107:3–11, 2018. 6

work page 2018

[14] [14]

Near perfect gan inversion

Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, and Aleix Martinez. Near perfect gan inversion. arXiv:2202.11833, 2022. 4

work page arXiv 2022

[15] [15]

Black, and Timo Bolkart

Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. Learning an animatable detailed 3D face model from in-the-wild images. vol- ume 40, 2021. 2, 4, 5, 8, 18, 19

work page 2021

[16] [16]

Learning an animatable detailed 3d face model from in-the-wild images

Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG) , 40(4):1–13, 2021. 17

work page 2021

[17] [17]

Con- trollable light diffusion for portraits, 2023

David Futschik, Kelvin Ritland, James Vecore, Sean Fanello, Sergio Orts-Escolano, Brian Curless, Daniel S ´ykora, and Rohit Pandey. Con- trollable light diffusion for portraits, 2023. 3

work page 2023

[18] [18]

Unsupervised training for 3d morphable model regression

Kyle Genova, Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T Freeman. Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 8377–8386, 2018. 17

work page 2018

[19] [19]

Gen- erative adversarial networks

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial networks. Communications of the ACM , 63(11):139– 144, 2020. 4

work page 2020

[20] [20]

Multi-pie

Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie. Image and vision computing , 2010. 3, 9, 10, 16

work page 2010

[21] [21]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016. 5

work page 2016

[22] [22]

Diffrelight: Diffusion-based facial performance relighting

Mingming He, Pascal Clausen, Ahmet Levent Tas ¸el, Li Ma, Oliver Pilarski, Wenqi Xian, Laszlo Rikker, Xueming Yu, Ryan Burgert, Ning Yu, et al. Diffrelight: Diffusion-based facial performance relighting. In SIGGRAPH Asia 2024 Conference Papers , pages 1–12, 2024. 4

work page 2024

[23] [23]

Denoising diffusion prob- abilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models. Advances in Neural Information Processing Systems , 33:6840–6851, 2020. 5, 6, 17

work page 2020

[24] [24]

Cascaded diffusion models for high fidelity image generation

Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Moham- mad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. , 23:47–1, 2022. 17

work page 2022

[25] [25]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv:2207.12598, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022

[26] [26]

Face relighting with geometrically consistent shadows

Andrew Hou, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Face relighting with geometrically consistent shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4217–4226, 2022. 1, 3, 4, 7, 8, 9, 10, 11, 12, 16, 18

work page 2022

[27] [27]

Towards high fidelity face relighting with realistic shadows

Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Towards high fidelity face relighting with realistic shadows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 14719–14728, 2021. 1, 3, 4, 8, 9, 10, 11, 12, 18

work page 2021

[28] [28]

3d face reconstruction with geometry details from a single image

Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, and Ligang Liu. 3d face reconstruction with geometry details from a single image. IEEE Transactions on Image Processing , 27(10):4756–4770, 2018. 17

work page 2018

[29] [29]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2019. 4, 9, 12, 16, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32

work page 2019

[30] [30]

Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and Rynson W.H. Lau. Harmonizer: Learning to perform white-box image and video harmo- nization. In European Conference on Computer Vision , 2022. 4

work page 2022

[31] [31]

Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting, 2024

Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting, 2024. 1, 3, 4, 9, 17, 18, 20

work page 2024

[32] [32]

Illumination-invariant face recog- nition with deep relit face images

Ha A Le and Ioannis A Kakadiaris. Illumination-invariant face recog- nition with deep relit face images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) . IEEE, 2019. 1, 3

work page 2019

[33] [33]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) , 2017. 5, 6, 17

work page 2017

[34] [34]

A closed-form solution to photorealistic image stylization

Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and Jan Kautz. A closed-form solution to photorealistic image stylization. In Proceedings of the European Conference on Computer Vision (ECCV) , 2018. 4

work page 2018

[35] [35]

Feature- preserving detailed 3d face reconstruction from a single image

Yue Li, Liqian Ma, Haoqiang Fan, and Kenny Mitchell. Feature- preserving detailed 3d face reconstruction from a single image. In Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production , pages 1–9, 2018. 17

work page 2018

[36] [36]

Targeting Ultimate Accuracy: Face Recognition via Deep Embedding

Jingtuo Liu, Yafeng Deng, Tao Bai, Zhengping Wei, and Chang Huang. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv:1506.07310, 2015. 17

work page internal anchor Pith review Pith/arXiv arXiv 2015

[37] [37]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 4, 10

work page 2022

[38] [38]

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv:2211.01095, 2022. 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2022

[39] [39]

Deep photo style transfer

Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4990–4998, 2017. 4

work page 2017

[40] [40]

Photoapp: Photorealistic appearance editing of head portraits

BR Mallikarjun, Ayush Tewari, Abdallah Dib, Tim Weyrich, Bernd Bickel, Hans Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Louis Chevallier, Mohamed A Elgharib, et al. Photoapp: Photorealistic appearance editing of head portraits. ACM Transactions on Graphics ,

work page

[41] [41]

Face-specific data augmentation for unconstrained face recog- nition

Iacopo Masi, Anh Tu ˆan Tr ˆa`n, Tal Hassner, Gozde Sahin, and G ´erard Medioni. Face-specific data augmentation for unconstrained face recog- nition. International Journal of Computer Vision , 127, 2019. 17

work page 2019

[42] [42]

Holo- relighting: Controllable volumetric portrait relighting from a single image

Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, and Vishal M Patel. Holo- relighting: Controllable volumetric portrait relighting from a single image. arXiv:2403.09632, 2024. 1, 3, 4, 9, 17, 18, 19

work page arXiv 2024

[43] [43]

Sdedit: Guided image synthesis and editing with stochastic differential equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations , 2021. 17

work page 2021

[44] [44]

Learning physics-guided face relighting under directional light

Thomas Nestmeyer, Jean-Franc ¸ois Lalonde, Iain Matthews, and Andreas Lehrmann. Learning physics-guided face relighting under directional light. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5124–5133, 2020. 1, 3, 4, 8, 9, 10, 18

work page 2020

[45] [45]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741, 2021. 6, 17

work page internal anchor Pith review Pith/arXiv arXiv 2021

[46] [46]

Vaes meet diffusion models: Efficient and high-fidelity generation

Kushagra Pandey, Avideep Mukherjee, Piyush Rai, and Abhishek Ku- mar. Vaes meet diffusion models: Efficient and high-fidelity generation. In NeurIPS 2021 Workshop on Deep Generative Models and Down- stream Applications, 2021. 17

work page 2021

[47] [47]

Total relighting: learning to relight portraits for background replacement

Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Christian Haene, Sofien Bouaziz, Christoph Rhemann, Paul Debevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG) , 40(4):1–21, 2021. 1, 3, 4, 9, 10, 11, 17, 18

work page 2021

[48] [48]

Relightify: Relightable 3d faces from a single image via diffusion models

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou, and Stefanos Zafeiriou. Relightify: Relightable 3d faces from a single image via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023. 3

work page 2023

[49] [49]

Deep face recognition

Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. 2015. 17

work page 2015

[50] [50]

DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supasorn Suwa- janakorn. Difareli: Diffusion face relighting. arXiv:2304.09479, 2023. 2, 3, 7, 8, 9, 11, 12, 18

work page internal anchor Pith review Pith/arXiv arXiv 2023

[51] [51]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dream- fusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022

[52] [52]

Diffusion autoencoders: Toward a meaningful and decodable representation

Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, and Su- pasorn Suwajanakorn. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022. 2, 5, 6, 7, 17, 19

work page 2022

[53] [53]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 17

work page 2021

[54] [54]

Explor- ing the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Explor- ing the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research , 21(1):5485–5551, 2020. 17 15

work page 2020

[55] [55]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022. 17

work page internal anchor Pith review Pith/arXiv arXiv 2022

[56] [56]

Facelit: Neural 3d relightable faces, 2023

Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, and Oncel Tuzel. Facelit: Neural 3d relightable faces, 2023. 4

work page 2023

[57] [57]

Relightful harmonization: Lighting-aware portrait background replacement, 2023

Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful harmonization: Lighting-aware portrait background replacement, 2023. 3, 4, 9

work page 2023

[58] [58]

Encoding in style: a stylegan encoder for image-to-image translation

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2021. 4

work page 2021

[59] [59]

Pivotal tuning for latent-based editing of real images

Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics (TOG) , 42(1):1–13, 2022. 4

work page 2022

[60] [60]

High-resolution image synthesis with latent diffusion models, 2021

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models, 2021. 6

work page 2021

[61] [61]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 10684–10695, 2022. 17

work page 2022

[62] [62]

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv:2208.12242, 2022. 17

work page arXiv 2022

[63] [63]

Palette: Image-to-image diffusion models

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022. 17

work page 2022

[64] [64]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487,

work page internal anchor Pith review Pith/arXiv arXiv

[65] [65]

Relightable gaussian codec avatars

Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2024. 4

work page 2024

[66] [66]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. arXiv:2311.17042, 2023. 3, 4

work page arXiv 2023

[67] [67]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 815–823, 2015. 17

work page 2015

[68] [68]

Sfsnet: Learning shape, reflectance and illuminance of facesin the wild

Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. Sfsnet: Learning shape, reflectance and illuminance of facesin the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition , 2018. 1, 3, 10, 17, 18

work page 2018

[69] [69]

Style transfer for headshot portraits

YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Fr´edo Durand. Style transfer for headshot portraits. 2014. 4

work page 2014

[70] [70]

Portrait lighting transfer using a mass transport approach

Zhixin Shu, Sunil Hadap, Eli Shechtman, Kalyan Sunkavalli, Sylvain Paris, and Dimitris Samaras. Portrait lighting transfer using a mass transport approach. ACM Transactions on Graphics (TOG) , 2017. 4, 17

work page 2017

[71] [71]

Neural face editing with intrinsic image disentangling

Zhixin Shu, Ersin Yumer, Sunil Hadap, Kalyan Sunkavalli, Eli Shecht- man, and Dimitris Samaras. Neural face editing with intrinsic image disentangling. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 5541–5550, 2017. 1, 3

work page 2017

[72] [72]

D2c: Diffusion-decoding models for few-shot conditional generation

Abhishek Sinha, Jiaming Song, Chenlin Meng, and Stefano Ermon. D2c: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems , 34, 2021. 17

work page 2021

[73] [73]

Deep unsupervised learning using nonequilibrium thermody- namics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermody- namics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning , Proceedings of Machine Learning Research, pages 2256–2265. PMLR, 2015. 5, 17

work page 2015

[74] [74]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representa- tions, 2021. 2, 5, 6

work page 2021

[75] [75]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consis- tency models. arXiv:2303.01469, 2023. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2023

[76] [76]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 32. Curran Asso- ciates, Inc., 2019. 5, 17

work page 2019

[77] [77]

Single image portrait relighting

Tiancheng Sun, Jonathan T Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul E Debevec, and Ravi Ramamoorthi. Single image portrait relighting. ACM Trans. Graph., 38(4):79–1, 2019. 3, 10, 18

work page 2019

[78] [78]

Deepface: Closing the gap to human-level performance in face veri- fication

Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face veri- fication. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1701–1708, 2014. 17

work page 2014

[79] [79]

Pie: Portrait image embedding for semantic control

Ayush Tewari, Mohamed Elgharib, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollh ¨ofer, and Christian Theobalt. Pie: Portrait image embedding for semantic control. ACM Transactions on Graphics (TOG), 39(6):1–14, 2020. 4

work page 2020

[80] [80]

Stylerig: Rigging stylegan for 3d control over portrait images

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick P ´erez, Michael Zollhofer, and Christian Theobalt. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6142–6151, 2020. 4

work page 2020