HarmoVid: Relightful Video Portrait Harmonization

Jae Shin Yoon; Joon-Young Lee; Jun Myeong Choi; Luchao Qi; Roni Sengupta

arxiv: 2605.28811 · v1 · pith:IKM4J3LUnew · submitted 2026-05-27 · 💻 cs.CV

HarmoVid: Relightful Video Portrait Harmonization

Jun Myeong Choi , Jae Shin Yoon , Luchao Qi , Roni Sengupta , Joon-Young Lee This is my paper

Pith reviewed 2026-06-29 13:34 UTC · model grok-4.3

classification 💻 cs.CV

keywords video harmonizationrelightingdiffusion modelportrait videotemporal coherencelighting deflickeringalpha mask conditioning

0 comments

The pith

A video diffusion model trained on deflickered image-harmonized pairs produces temporally coherent relightful portrait harmonization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to adjust lighting, shadows, and color in foreground video portraits so they match a target background scene. Acquiring true paired video data with identical motion under different lights is impractical, so the approach starts with existing image harmonization models applied frame by frame and then corrects the resulting temporal jitter with a dedicated lighting deflickering model. The stabilized pairs, together with real and synthetic videos, train a video diffusion model that also uses asymmetric alpha mask conditioning to learn clean boundaries. If successful, the resulting videos maintain consistent lighting across frames while allowing expressive relighting that respects physical behavior.

Core claim

We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination intensity. A novel lighting deflickering model stabilizes flickering artifacts in frame-by-frame image harmonization outputs, enabling a video diffusion model to learn from these upgraded pairs plus real and synthetic videos. Asymmetric alpha mask conditioning further supports clean boundaries from real videos, yielding results with strong temporal coherence and physically meaningful lighting.

What carries the argument

The lighting deflickering model that removes global and local flickering from image-based harmonization outputs, paired with asymmetric alpha mask conditioning inside the video diffusion model.

If this is right

The harmonized videos exhibit strong temporal coherence without flickering.
Lighting adjustments remain expressive while producing natural color tones and physically consistent shadows.
Boundaries between foreground and background appear cleaner than in prior frame-wise methods.
The same pipeline maintains performance on both real and synthetic input videos.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The deflickering step could be reused as a post-process for other video tasks that suffer from per-frame inconsistency.
Training on a mixture of real and synthetic data suggests the model may generalize to unseen motion and lighting combinations not present in the original image datasets.
If deflickering quality improves further, the need for large-scale real paired video capture could be reduced.

Load-bearing premise

Frame-by-frame image harmonization followed by the deflickering step yields paired training data free of systematic artifacts that would block the diffusion model from learning stable, physically plausible lighting.

What would settle it

Generate videos from the model and check whether they display persistent frame-to-frame lighting jitter or shadows that violate expected physical behavior under the target illumination; consistent failure on these checks would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.28811 by Jae Shin Yoon, Joon-Young Lee, Jun Myeong Choi, Luchao Qi, Roni Sengupta.

**Figure 1.** Figure 1: We present HarmoVid, a relightable video harmonization framework that aligns lighting between foreground and background. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Samples from our paired video harmonization dataset. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Our three-step pipeline for data synthesis and training for video harmonization. In the first state, we apply off-the-shelf image [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Our relightful video harmonization framework. Har [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of our HarmoVid to other image- or video-based video harmonization methods. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Result on non-portrait foreground object. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of an example video from the user study. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: The user study results for temporal consistency and identity preservation. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: The user study results for overall harmonization quality. [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination intensity (relightful harmonization). Unlike images, acquiring labeled data for videos, where identical motions are recorded under different lighting conditions, is practically infeasible and non-scalable. While one way to create such paired data is to apply existing image-based harmonization models frame by frame to a video, the resulting outputs often suffer from significant temporal jitters. We overcome this problem by introducing a novel lighting deflickering model that can stabilize the global and local lighting flickering artifacts. Our video diffusion model learns from these upgraded deflickered data with a volume of real and synthetic videos to generate high-quality video harmonization results. We further propose an asymmetric alpha mask conditioning technique to learn the clean boundaries from real videos. Experiments demonstrate that our model achieves strong temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting behavior, while maintaining strong relighting expressiveness compared to prior image-based and video-based harmonization methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is adding a lighting deflickering stage so frame-by-frame image harmonization can serve as training data for a video diffusion model with asymmetric mask conditioning.

read the letter

The one thing to know is that this work tackles the data problem for video relightful harmonization by running existing image methods per frame and then cleaning the results with a dedicated deflickering model before feeding them to a video diffusion backbone. The asymmetric alpha mask conditioning is the other concrete addition for boundary handling.

The new engineering piece is that specific sequence: deflickering plus the mask trick on top of a diffusion model. The paper does a solid job spelling out why real paired video data under varying lights is basically impossible to collect at scale and why the jitter from naive frame-wise application breaks training. Using a mix of real and synthetic videos is a reasonable way to fill the gap.

The soft spot is the data quality assumption. If the deflickering leaves residual temporal artifacts or systematic biases, the diffusion model could learn those instead of actual lighting behavior. The abstract gives no numbers or ablation details on how clean the pairs end up or how much the deflickering actually helps the final output, so the full experiments need to show that the model extracts physically meaningful lighting rather than just smoothing. That concern is real but not fatal on its own.

This is for people doing video compositing or portrait editing pipelines where temporal consistency matters. A reader already working with diffusion models for video or with harmonization tasks would get practical value from the pipeline description.

It deserves peer review. The problem is concrete, the workaround is testable, and the claims can be checked against the experiments once the numbers are in front of a referee.

Referee Report

2 major / 2 minor

Summary. The paper presents HarmoVid, a method for relightful video portrait harmonization that adjusts foreground lighting (shadows, color tone, illumination) to match a target background. It addresses the lack of paired video data by generating training pairs via frame-by-frame application of existing image-based harmonizers followed by a novel lighting deflickering model to reduce temporal jitter; a video diffusion model is then trained on these deflickered pairs (augmented with real and synthetic videos) together with an asymmetric alpha mask conditioning scheme to improve boundary quality. Experiments are claimed to show improved temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting while preserving relighting expressiveness relative to prior image- and video-based methods.

Significance. If the central claims hold, the work would represent a meaningful step toward practical video-level relighting and harmonization, particularly by demonstrating a scalable route to paired training data via deflickering and by extending diffusion models to this task. The emphasis on physically meaningful lighting behavior and temporal stability addresses a clear gap between image harmonization and video applications.

major comments (2)

[§3.2] §3.2 (Data Generation): The central claim that the video diffusion model learns stable, physically meaningful lighting (rather than artifacts from the deflickering process) rests on the unverified assumption that deflickered frame-by-frame harmonized pairs are free of systematic residual jitter or bias; no ablation is described that isolates whether the model extracts lighting or merely reproduces the deflickering output.
[§4] §4 (Experiments): Performance claims of 'strong temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting' are stated without accompanying quantitative metrics (e.g., temporal consistency scores, lighting error metrics, or statistical significance tests), making it impossible to assess whether improvements are load-bearing or within noise.

minor comments (2)

Notation for the asymmetric alpha mask conditioning is introduced without an explicit equation or diagram showing how the mask asymmetry is implemented during training versus inference.
The abstract and introduction cite 'prior image-based and video-based harmonization methods' but the related-work section would benefit from a concise table comparing key architectural differences (e.g., diffusion vs. GAN, temporal modeling).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] The central claim that the video diffusion model learns stable, physically meaningful lighting (rather than artifacts from the deflickering process) rests on the unverified assumption that deflickered frame-by-frame harmonized pairs are free of systematic residual jitter or bias; no ablation is described that isolates whether the model extracts lighting or merely reproduces the deflickering output.

Authors: We acknowledge the value of an explicit ablation to isolate the diffusion model's contribution from the deflickering step. The deflickering model is trained to minimize lighting jitter while retaining the target lighting from the image harmonizer, and the diffusion model is trained on the resulting pairs augmented with real and synthetic videos. Our comparisons to frame-by-frame baselines already show improved coherence, but to directly address the concern we will add an ablation in the revised Section 3.2 comparing training on deflickered versus jittery pairs, confirming that the model learns consistent lighting rather than simply reproducing deflickering outputs. revision: yes
Referee: [§4] Performance claims of 'strong temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting' are stated without accompanying quantitative metrics (e.g., temporal consistency scores, lighting error metrics, or statistical significance tests), making it impossible to assess whether improvements are load-bearing or within noise.

Authors: We agree that quantitative metrics would provide stronger support for the claims. The manuscript currently emphasizes qualitative results and user studies. In the revision we will add temporal consistency metrics (e.g., frame-to-frame warping error), lighting consistency measures relative to the background, and statistical significance testing for the user studies, reporting these in Section 4 alongside the existing comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a data-driven video diffusion model trained on pairs generated by applying image-based harmonization frame-by-frame followed by a lighting deflickering step, plus asymmetric alpha mask conditioning. No equations, derivations, or first-principles results are presented in the abstract or described method that reduce any claimed prediction or result to a quantity defined by the authors' own prior work or by construction. The central claims rest on empirical training and experimental evaluation rather than self-referential identities or fitted inputs renamed as predictions. This matches the reader's assessment that no derivation chain reduces the result to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly rests on the domain assumption that diffusion models can capture lighting physics from paired video data.

axioms (1)

domain assumption Frame-by-frame image harmonization plus a separate deflickering step yields training pairs that allow a video diffusion model to learn temporally stable and physically plausible lighting.
This premise is required for the data-preparation stage described in the abstract to serve as valid supervision.

pith-pipeline@v0.9.1-grok · 5726 in / 1354 out tokens · 32236 ms · 2026-06-29T13:34:50.979241+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Lightctrl: Training-free controllable video re- lighting, 2025

Anonymous. Lightctrl: Training-free controllable video re- lighting, 2025. under review in ICLR2026. 3

2025
[2]

Real-time 3d-aware portrait video relighting, 2024

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, and Lin Gao. Real-time 3d-aware portrait video relighting, 2024. 2

2024
[3]

Real-time 3d-aware portrait video relighting, 2024

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, and Lin Gao. Real-time 3d-aware portrait video relighting, 2024. 3

2024
[4]

Text2relight: Creative portrait relighting with text guidance

Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, and Seungryul Baek. Text2relight: Creative portrait relighting with text guidance. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1980– 1988, 2025. 1, 2, 4

1980
[5]

Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces, 2025

Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, and Zhixin Shu. Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces, 2025. 2

2025
[6]

Videocrafter2: Overcoming data limitations for high-quality video diffusion models, 2024

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. Videocrafter2: Overcoming data limitations for high-quality video diffusion models, 2024. 4

2024
[7]

Per- sonalized video relighting with an at-home light stage

Jun Myeong Choi, Max Christman, and Roni Sengupta. Per- sonalized video relighting with an at-home light stage. In European Conference on Computer Vision, pages 394–410. Springer, 2024. 2, 3

2024
[8]

Scribblelight: Single image indoor relighting with scribbles

Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhat- tad, and Roni Sengupta. Scribblelight: Single image indoor relighting with scribbles. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 5720–5731,
[9]

Relightvid: Temporal-consistent diffusion model for video relighting.arXiv preprint arXiv:2501.16330, 2025

Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, and Dahua Lin. Relightvid: Temporal-consistent diffusion model for video relighting.arXiv preprint arXiv:2501.16330, 2025. 2, 3, 5, 6

work page arXiv 2025
[10]

Portrait video editing em- powered by multimodal generative priors, 2024

Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, and Juyong Zhang. Portrait video editing em- powered by multimodal generative priors, 2024. 3

2024
[11]

High-fidelity relightable monocular portrait animation with lighting- controllable video diffusion model, 2025

Mingtao Guo, Guanyu Xing, and Yanli Liu. High-fidelity relightable monocular portrait animation with lighting- controllable video diffusion model, 2025. 3

2025
[12]

Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025

Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, and Zian Wang. Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025. 3

2025
[13]

Beam: Bridging physically-based rendering and gaussian modeling for relightable volumetric video, 2025

Yu Hong, Yize Wu, Zhehao Shen, Chengcheng Guo, Yuheng Jiang, Yingliang Zhang, Jingyi Yu, and Lan Xu. Beam: Bridging physically-based rendering and gaussian modeling for relightable volumetric video, 2025. 3

2025
[14]

Towards high fidelity face relight- ing with realistic shadows, 2021

Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Towards high fidelity face relight- ing with realistic shadows, 2021. 2

2021
[15]

Compose: Comprehensive portrait shadow editing, 2024

Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yan- nick Hold-Geoffroy, Jae Shin Yoon, and Xiaoming Liu. Compose: Comprehensive portrait shadow editing, 2024. 2

2024
[16]

Neural gaffer: Relighting any object via diffusion.Advances in Neu- ral Information Processing Systems, 37:141129–141152,

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion.Advances in Neu- ral Information Processing Systems, 37:141129–141152,
[17]

Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023

Levon Khachatryan, Andranik Movsisyan, Vahram Tade- vosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023. 4

2023
[18]

Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024

Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024. 2

2024
[19]

Blind video deflickering by neural filtering with a flawed atlas

Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, and Qifeng Chen. Blind video deflickering by neural filtering with a flawed atlas. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10439– 10448, 2023. 4, 5, 6

2023
[20]

Diffusion- renderer: Neural inverse and forward rendering with video diffusion models, 2025

Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nan- dita Vijaykumar, Sanja Fidler, and Zian Wang. Diffusion- renderer: Neural inverse and forward rendering with video diffusion models, 2025. 3

2025
[21]

Relightable and animatable neural avatars from videos, 2023

Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, and Feng Xu. Relightable and animatable neural avatars from videos, 2023. 3

2023
[22]

Illumicraft: Unified geometry and il- lumination diffusion for controllable video generation, 2025

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, and Ming-Hsuan Yang. Illumicraft: Unified geometry and il- lumination diffusion for controllable video generation, 2025. 3

2025
[23]

Tc-light: Temporally coherent generative render- ing for realistic world transfer, 2025

Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Zhaoxiang Zhang, and Jun- ran Peng. Tc-light: Temporally coherent generative render- ing for realistic world transfer, 2025. 2, 3

2025
[24]

Deep video harmonization with color map- ping consistency.arXiv preprint arXiv:2205.00687, 2022

Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, and Liqing Zhang. Deep video harmonization with color map- ping consistency.arXiv preprint arXiv:2205.00687, 2022. 5

work page arXiv 2022
[25]

Yiqun Mei, He Zhang, Xuaner Zhang, Jianming Zhang, Zhixin Shu, Yilin Wang, Zijun Wei, Shi Yan, HyunJoon Jung, and Vishal M. Patel. Lightpainter: Interactive portrait relighting with freehand scribble, 2023. 2

2023
[27]

Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, and Vishal M. Pa- tel. Holo-relighting: Controllable volumetric portrait relight- ing from a single image, 2024. 2

2024
[28]

Patel, and Paul Debevec

Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Lev- ent Tas ¸el, Ning Yu, Vishal M. Patel, and Paul Debevec. Lux post facto: Learning portrait performance relighting with conditional video diffusion and a hybrid dataset, 2025. 3

2025
[29]

Im- age quality assessment for performance evaluation of focus measure operators, 2016

Farida Memon, Mukhtiar Ali Unar, and Sheeraz Memon. Im- age quality assessment for performance evaluation of focus measure operators, 2016. 8

2016
[30]

Total relighting: learning to relight portraits for background replacement.ACM Trans

Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Chris- tian H ¨ane, Sofien Bouaziz, Christoph Rhemann, Paul De- bevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement.ACM Trans. Graph., 40(4), 2021. 2

2021
[31]

Perazzi, J

F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Computer Vision and Pattern Recognition, 2016. 1

2016
[32]

Difareli++: Diffusion face relighting with consistent cast shadows.arXiv e-prints, pages arXiv– 2304, 2023

Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supa- sorn Suwajanakorn. Difareli++: Diffusion face relighting with consistent cast shadows.arXiv e-prints, pages arXiv– 2304, 2023. 2

2023
[33]

Over++: Generative video compositing for layer interaction effects.arXiv preprint arXiv:2512.19661, 2025

Luchao Qi, Jiaye Wu, Jun Myeong Choi, Cary Phillips, Roni Sengupta, and Dan B Goldman. Over++: Generative video compositing for layer interaction effects.arXiv preprint arXiv:2512.19661, 2025. 3

work page arXiv 2025
[34]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 5

2021
[36]

Lite2relight: 3d-aware single image portrait relighting

Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, and Chris- tian Theobalt. Lite2relight: 3d-aware single image portrait relighting. InSpecial Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers 24, page 1–12. ACM...

2024
[37]

Relightful harmonization: Lighting-aware portrait background replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful harmonization: Lighting-aware portrait background replacement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6452–6462, 2024. 1, 2, 4, 5, 6

2024
[38]

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, et al. Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159,

work page internal anchor Pith review Pith/arXiv arXiv
[39]

Half-body portrait relighting with overcomplete lighting representation.Computer Graphics Forum, 40(6): 371–381, 2021

Guoxian Song, Tat-Jen Cham, Jianfei Cai, and Jianmin Zheng. Half-body portrait relighting with overcomplete lighting representation.Computer Graphics Forum, 40(6): 371–381, 2021. 2

2021
[40]

Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi

Tiancheng Sun, Jonathan T. Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi. Single image portrait relighting.ACM Transactions on Graphics, 38(4): 1–12, 2019. 2

2019
[41]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean conference on com- puter vision, pages 402–419. Springer, 2020. 5

2020
[42]

An image inpainting technique based on the fast marching method, 2004

Alexandru Telea. An image inpainting technique based on the fast marching method, 2004. 5

2004
[43]

Single image portrait relighting via explicit mul- tiple reflectance channel modeling.ACM Trans

Zhibo Wang, Xin Yu, Ming Lu, Quan Wang, Chen Qian, and Feng Xu. Single image portrait relighting via explicit mul- tiple reflectance channel modeling.ACM Trans. Graph., 39 (6), 2020. 2

2020
[44]

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7623–7633, 2023. 4

2023
[45]

Relightable and animatable neural avatar from sparse-view video, 2023

Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Relightable and animatable neural avatar from sparse-view video, 2023. 3

2023
[46]

Follow your motion: A generic temporal con- sistency portrait editing framework with trajectory guidance,

Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, and Jian Yang. Follow your motion: A generic temporal con- sistency portrait editing framework with trajectory guidance,
[47]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiao- han Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation.ACM Transactions on Graphics, 41(6):1–21,

Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, and Ting-Chun Wang. Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation.ACM Transactions on Graphics, 41(6):1–21,
[49]

Generative portrait shadow removal, 2024

Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Xuaner Zhang, Yannick Hold-Geoffroy, Krishna Kumar Singh, and He Zhang. Generative portrait shadow removal, 2024. 1, 2

2024
[50]

Lumen: Consistent video relighting and harmo- nious background replacement with video generative mod- els, 2025

Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, and Kun Yuan. Lumen: Consistent video relighting and harmo- nious background replacement with video generative mod- els, 2025. 2, 3

2025
[51]

Neural video portrait relighting in real-time via con- sistency modeling, 2021

Longwen Zhang, Qixuan Zhang, Minye Wu, Jingyi Yu, and Lan Xu. Neural video portrait relighting in real-time via con- sistency modeling, 2021. 2, 3

2021
[52]

Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port. InThe Thirteenth International Conference on Learn- ing Representations, 2025. 1, 2, 5, 6

2025
[53]

Lumisculpt: Enabling consistent portrait lighting in video generation, 2025

Yuxin Zhang, Dandan Zheng, Biao Gong, Shiwen Wang, Jingdong Chen, Ming Yang, Weiming Dong, and Chang- sheng Xu. Lumisculpt: Enabling consistent portrait lighting in video generation, 2025. 2, 3

2025
[54]

Avid: Any-length video inpainting with diffusion model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, and Licheng Yu. Avid: Any-length video inpainting with diffusion model. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7162–7172, 2024. 5

2024
[55]

Vidcraft3: Camera, object, and lighting control for image-to-video generation,

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, and Yanwei Fu. Vidcraft3: Camera, object, and lighting control for image-to-video generation,
[56]

Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W. Jacobs. Deep single-image portrait relighting. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), 2019. 2

2019
[57]

Deep single-image portrait relighting

Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W Jacobs. Deep single-image portrait relighting. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 7194–7202, 2019. 2

2019
[58]

Light-a-video: Training-free video relighting via progressive light fusion.arXiv preprint arXiv:2502.08590, 2025

Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, et al. Light-a-video: Training-free video relighting via progressive light fusion.arXiv preprint arXiv:2502.08590, 2025. 2, 3, 5, 6 HarmoVid: Relightful Video Portrait Harmonization Supplementary Material Along with this supplemental PD...

work page arXiv 2025

[1] [1]

Lightctrl: Training-free controllable video re- lighting, 2025

Anonymous. Lightctrl: Training-free controllable video re- lighting, 2025. under review in ICLR2026. 3

2025

[2] [2]

Real-time 3d-aware portrait video relighting, 2024

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, and Lin Gao. Real-time 3d-aware portrait video relighting, 2024. 2

2024

[3] [3]

Real-time 3d-aware portrait video relighting, 2024

Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, and Lin Gao. Real-time 3d-aware portrait video relighting, 2024. 3

2024

[4] [4]

Text2relight: Creative portrait relighting with text guidance

Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, and Seungryul Baek. Text2relight: Creative portrait relighting with text guidance. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1980– 1988, 2025. 1, 2, 4

1980

[5] [5]

Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces, 2025

Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, and Zhixin Shu. Synthlight: Por- trait relighting with diffusion model by learning to re-render synthetic faces, 2025. 2

2025

[6] [6]

Videocrafter2: Overcoming data limitations for high-quality video diffusion models, 2024

Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. Videocrafter2: Overcoming data limitations for high-quality video diffusion models, 2024. 4

2024

[7] [7]

Per- sonalized video relighting with an at-home light stage

Jun Myeong Choi, Max Christman, and Roni Sengupta. Per- sonalized video relighting with an at-home light stage. In European Conference on Computer Vision, pages 394–410. Springer, 2024. 2, 3

2024

[8] [8]

Scribblelight: Single image indoor relighting with scribbles

Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhat- tad, and Roni Sengupta. Scribblelight: Single image indoor relighting with scribbles. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 5720–5731,

[9] [9]

Relightvid: Temporal-consistent diffusion model for video relighting.arXiv preprint arXiv:2501.16330, 2025

Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, and Dahua Lin. Relightvid: Temporal-consistent diffusion model for video relighting.arXiv preprint arXiv:2501.16330, 2025. 2, 3, 5, 6

work page arXiv 2025

[10] [10]

Portrait video editing em- powered by multimodal generative priors, 2024

Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, and Juyong Zhang. Portrait video editing em- powered by multimodal generative priors, 2024. 3

2024

[11] [11]

High-fidelity relightable monocular portrait animation with lighting- controllable video diffusion model, 2025

Mingtao Guo, Guanyu Xing, and Yanli Liu. High-fidelity relightable monocular portrait animation with lighting- controllable video diffusion model, 2025. 3

2025

[12] [12]

Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025

Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski, Zan Gojcic, and Zian Wang. Unirelight: Learning joint decomposition and synthesis for video relight- ing, 2025. 3

2025

[13] [13]

Beam: Bridging physically-based rendering and gaussian modeling for relightable volumetric video, 2025

Yu Hong, Yize Wu, Zhehao Shen, Chengcheng Guo, Yuheng Jiang, Yingliang Zhang, Jingyi Yu, and Lan Xu. Beam: Bridging physically-based rendering and gaussian modeling for relightable volumetric video, 2025. 3

2025

[14] [14]

Towards high fidelity face relight- ing with realistic shadows, 2021

Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, and Xiaoming Liu. Towards high fidelity face relight- ing with realistic shadows, 2021. 2

2021

[15] [15]

Compose: Comprehensive portrait shadow editing, 2024

Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yan- nick Hold-Geoffroy, Jae Shin Yoon, and Xiaoming Liu. Compose: Comprehensive portrait shadow editing, 2024. 2

2024

[16] [16]

Neural gaffer: Relighting any object via diffusion.Advances in Neu- ral Information Processing Systems, 37:141129–141152,

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion.Advances in Neu- ral Information Processing Systems, 37:141129–141152,

[17] [17]

Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023

Levon Khachatryan, Andranik Movsisyan, Vahram Tade- vosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. Text2video-zero: Text-to- image diffusion models are zero-shot video generators, 2023. 4

2023

[18] [18]

Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024

Hoon Kim, Minje Jang, Wonjun Yoon, Jisoo Lee, Donghyun Na, and Sanghyun Woo. Switchlight: Co-design of physics- driven architecture and pre-training framework for human portrait relighting, 2024. 2

2024

[19] [19]

Blind video deflickering by neural filtering with a flawed atlas

Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, and Qifeng Chen. Blind video deflickering by neural filtering with a flawed atlas. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10439– 10448, 2023. 4, 5, 6

2023

[20] [20]

Diffusion- renderer: Neural inverse and forward rendering with video diffusion models, 2025

Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Zhi-Hao Lin, Jun Gao, Alexander Keller, Nan- dita Vijaykumar, Sanja Fidler, and Zian Wang. Diffusion- renderer: Neural inverse and forward rendering with video diffusion models, 2025. 3

2025

[21] [21]

Relightable and animatable neural avatars from videos, 2023

Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, and Feng Xu. Relightable and animatable neural avatars from videos, 2023. 3

2023

[22] [22]

Illumicraft: Unified geometry and il- lumination diffusion for controllable video generation, 2025

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, and Ming-Hsuan Yang. Illumicraft: Unified geometry and il- lumination diffusion for controllable video generation, 2025. 3

2025

[23] [23]

Tc-light: Temporally coherent generative render- ing for realistic world transfer, 2025

Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Zhaoxiang Zhang, and Jun- ran Peng. Tc-light: Temporally coherent generative render- ing for realistic world transfer, 2025. 2, 3

2025

[24] [24]

Deep video harmonization with color map- ping consistency.arXiv preprint arXiv:2205.00687, 2022

Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, and Liqing Zhang. Deep video harmonization with color map- ping consistency.arXiv preprint arXiv:2205.00687, 2022. 5

work page arXiv 2022

[25] [25]

Yiqun Mei, He Zhang, Xuaner Zhang, Jianming Zhang, Zhixin Shu, Yilin Wang, Zijun Wei, Shi Yan, HyunJoon Jung, and Vishal M. Patel. Lightpainter: Interactive portrait relighting with freehand scribble, 2023. 2

2023

[26] [27]

Yiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, and Vishal M. Pa- tel. Holo-relighting: Controllable volumetric portrait relight- ing from a single image, 2024. 2

2024

[27] [28]

Patel, and Paul Debevec

Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Lev- ent Tas ¸el, Ning Yu, Vishal M. Patel, and Paul Debevec. Lux post facto: Learning portrait performance relighting with conditional video diffusion and a hybrid dataset, 2025. 3

2025

[28] [29]

Im- age quality assessment for performance evaluation of focus measure operators, 2016

Farida Memon, Mukhtiar Ali Unar, and Sheeraz Memon. Im- age quality assessment for performance evaluation of focus measure operators, 2016. 8

2016

[29] [30]

Total relighting: learning to relight portraits for background replacement.ACM Trans

Rohit Pandey, Sergio Orts Escolano, Chloe Legendre, Chris- tian H ¨ane, Sofien Bouaziz, Christoph Rhemann, Paul De- bevec, and Sean Fanello. Total relighting: learning to relight portraits for background replacement.ACM Trans. Graph., 40(4), 2021. 2

2021

[30] [31]

Perazzi, J

F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Computer Vision and Pattern Recognition, 2016. 1

2016

[31] [32]

Difareli++: Diffusion face relighting with consistent cast shadows.arXiv e-prints, pages arXiv– 2304, 2023

Puntawat Ponglertnapakorn, Nontawat Tritrong, and Supa- sorn Suwajanakorn. Difareli++: Diffusion face relighting with consistent cast shadows.arXiv e-prints, pages arXiv– 2304, 2023. 2

2023

[32] [33]

Over++: Generative video compositing for layer interaction effects.arXiv preprint arXiv:2512.19661, 2025

Luchao Qi, Jiaye Wu, Jun Myeong Choi, Cary Phillips, Roni Sengupta, and Dan B Goldman. Over++: Generative video compositing for layer interaction effects.arXiv preprint arXiv:2512.19661, 2025. 3

work page arXiv 2025

[33] [34]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 5

2021

[34] [36]

Lite2relight: 3d-aware single image portrait relighting

Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, and Chris- tian Theobalt. Lite2relight: 3d-aware single image portrait relighting. InSpecial Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers 24, page 1–12. ACM...

2024

[35] [37]

Relightful harmonization: Lighting-aware portrait background replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, and He Zhang. Relightful harmonization: Lighting-aware portrait background replacement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6452–6462, 2024. 1, 2, 4, 5, 6

2024

[36] [38]

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, et al. Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159,

work page internal anchor Pith review Pith/arXiv arXiv

[37] [39]

Half-body portrait relighting with overcomplete lighting representation.Computer Graphics Forum, 40(6): 371–381, 2021

Guoxian Song, Tat-Jen Cham, Jianfei Cai, and Jianmin Zheng. Half-body portrait relighting with overcomplete lighting representation.Computer Graphics Forum, 40(6): 371–381, 2021. 2

2021

[38] [40]

Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi

Tiancheng Sun, Jonathan T. Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi. Single image portrait relighting.ACM Transactions on Graphics, 38(4): 1–12, 2019. 2

2019

[39] [41]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean conference on com- puter vision, pages 402–419. Springer, 2020. 5

2020

[40] [42]

An image inpainting technique based on the fast marching method, 2004

Alexandru Telea. An image inpainting technique based on the fast marching method, 2004. 5

2004

[41] [43]

Single image portrait relighting via explicit mul- tiple reflectance channel modeling.ACM Trans

Zhibo Wang, Xin Yu, Ming Lu, Quan Wang, Chen Qian, and Feng Xu. Single image portrait relighting via explicit mul- tiple reflectance channel modeling.ACM Trans. Graph., 39 (6), 2020. 2

2020

[42] [44]

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7623–7633, 2023. 4

2023

[43] [45]

Relightable and animatable neural avatar from sparse-view video, 2023

Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Relightable and animatable neural avatar from sparse-view video, 2023. 3

2023

[44] [46]

Follow your motion: A generic temporal con- sistency portrait editing framework with trajectory guidance,

Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, and Jian Yang. Follow your motion: A generic temporal con- sistency portrait editing framework with trajectory guidance,

[45] [47]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiao- han Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[46] [48]

Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation.ACM Transactions on Graphics, 41(6):1–21,

Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, and Ting-Chun Wang. Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation.ACM Transactions on Graphics, 41(6):1–21,

[47] [49]

Generative portrait shadow removal, 2024

Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Xuaner Zhang, Yannick Hold-Geoffroy, Krishna Kumar Singh, and He Zhang. Generative portrait shadow removal, 2024. 1, 2

2024

[48] [50]

Lumen: Consistent video relighting and harmo- nious background replacement with video generative mod- els, 2025

Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, and Kun Yuan. Lumen: Consistent video relighting and harmo- nious background replacement with video generative mod- els, 2025. 2, 3

2025

[49] [51]

Neural video portrait relighting in real-time via con- sistency modeling, 2021

Longwen Zhang, Qixuan Zhang, Minye Wu, Jingyi Yu, and Lan Xu. Neural video portrait relighting in real-time via con- sistency modeling, 2021. 2, 3

2021

[50] [52]

Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Scal- ing in-the-wild training for diffusion-based illumination har- monization and editing by imposing consistent light trans- port. InThe Thirteenth International Conference on Learn- ing Representations, 2025. 1, 2, 5, 6

2025

[51] [53]

Lumisculpt: Enabling consistent portrait lighting in video generation, 2025

Yuxin Zhang, Dandan Zheng, Biao Gong, Shiwen Wang, Jingdong Chen, Ming Yang, Weiming Dong, and Chang- sheng Xu. Lumisculpt: Enabling consistent portrait lighting in video generation, 2025. 2, 3

2025

[52] [54]

Avid: Any-length video inpainting with diffusion model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, and Licheng Yu. Avid: Any-length video inpainting with diffusion model. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7162–7172, 2024. 5

2024

[53] [55]

Vidcraft3: Camera, object, and lighting control for image-to-video generation,

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, and Yanwei Fu. Vidcraft3: Camera, object, and lighting control for image-to-video generation,

[54] [56]

Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W. Jacobs. Deep single-image portrait relighting. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), 2019. 2

2019

[55] [57]

Deep single-image portrait relighting

Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W Jacobs. Deep single-image portrait relighting. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 7194–7202, 2019. 2

2019

[56] [58]

Light-a-video: Training-free video relighting via progressive light fusion.arXiv preprint arXiv:2502.08590, 2025

Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, et al. Light-a-video: Training-free video relighting via progressive light fusion.arXiv preprint arXiv:2502.08590, 2025. 2, 3, 5, 6 HarmoVid: Relightful Video Portrait Harmonization Supplementary Material Along with this supplemental PD...

work page arXiv 2025