VDFP: Video Deflickering with Flicker-banding Priors

Libo Zhu; Xiaokang Yang; Yulun Zhang; Zhiyi Zhou; Zihan Zhou

arxiv: 2605.21079 · v2 · pith:I2EJYVKYnew · submitted 2026-05-20 · 💻 cs.CV

VDFP: Video Deflickering with Flicker-banding Priors

Zhiyi Zhou , Libo Zhu , Zihan Zhou , Yulun Zhang , Xiaokang Yang This is my paper

Pith reviewed 2026-05-22 09:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords video deflickeringflicker bandingrolling shutterdegradation fieldscreen capturevideo restorationtemporal consistencyDeViD dataset

0 comments

The pith

VDFP removes complex banding from smartphone recordings of digital screens by synthesizing realistic flicker patterns and tracking gradual luminance changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix severe periodic brightness fluctuations that appear when phones capture screens, a problem that breaks existing video restoration techniques and leaves artifacts or blurs textures. It does this by first building the DeViD collection of real screen recordings and then introducing a generation model whose degradation field step recreates the multi-banding that rolling shutters produce. A continuous prior perception module, trained with a flicker-aware error, replaces binary detection so the model can follow smooth brightness transitions across space and time. If the approach holds, common screen videos could be cleaned up while retaining sharp details and frame-to-frame smoothness, something prior methods have not achieved.

Core claim

VDFP constructs the DeViD real-world dataset and introduces Degradation Field Modeling based on the rolling shutter mechanism to synthesize complex multi-banding scenarios, paired with spatial-temporal continuous prior perception optimized via Flicker-Aware Mean Squared Error; zero-initializing an augmented input layer lets the model keep pre-trained generative priors while restoring videos that eliminate banding, maintain high-fidelity spatial details, and achieve temporal consistency, outperforming prior restoration methods.

What carries the argument

Degradation Field Modeling (DFM) that generates multi-banding from rolling shutter mismatches, together with spatial-temporal continuous prior perception (CPP) that learns luminance transitions through Flicker-Aware Mean Squared Error instead of binary segmentation.

If this is right

Screen-captured videos can be restored without residual periodic artifacts or texture over-smoothing.
Spatial details stay sharp while temporal consistency across frames is preserved.
Pre-trained generative models retain their knowledge when an extra input layer is added and zero-initialized.
Complex multi-banding cases become tractable through explicit rolling-shutter synthesis rather than generic noise models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The DeViD dataset could become a standard testbed for any future work on periodic artifact removal in mobile video.
The same degradation modeling idea might transfer to other hardware-synchronization issues such as rolling-shutter distortion in fast motion.
Wider adoption could improve quality in screen-sharing recordings and digital preservation of displayed content.

Load-bearing premise

The rolling shutter degradation field can generate synthetic banding patterns that closely match the structured fluctuations seen in actual smartphone screen captures.

What would settle it

Running VDFP on a fresh collection of real phone-recorded screen videos and observing that visible banding remains or that fine spatial textures are lost or smoothed would show the method does not fully solve the problem.

Figures

Figures reproduced from arXiv: 2605.21079 by Libo Zhu, Xiaokang Yang, Yulun Zhang, Zhiyi Zhou, Zihan Zhou.

**Figure 2.** Figure 2: Visual comparison between VDFP and other methods trained on our real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the data acquisition and alignment of DeViD (left) and examples of some [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of our proposed model (VDFP). The first stage is the training process of the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of our data simulation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of results of our simulation pipeline. The row and column respectively mean [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of predicted confidence maps of our real-world dataset (DeViD). Colors in the [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Spatio-temporal kinematic tracking and phase alignment. (a-b) Y-T spatio-temporal slices, [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Visual comparison between VDFP and other methods on our real-world dataset DeViD. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Visual comparisons of ablation study. The figures show three sets of examples, and from [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: More examples of our real-world dataset (DeViD) in various scenes, such as sports, [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: More visual comparison between VDFP and other methods on our real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: More visual comparisons of ablation study. The figures show three sets of examples, and [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

read the original abstract

Capturing digital screens with smartphones frequently induces severe banding due to hardware synchronization mismatches. Existing video restoration methods struggle with these structured, periodic luminance fluctuations, often resulting in residual artifacts or over-smoothed textures. We firstly construct DeViD, a real-world dataset in various scenes to deal with the lack of available datasets. Then we propose VDFP (Video Deflickering with Flicker-banding Priors), a novel perception-guided generation framework. First, we introduce a Degradation Field Modeling Based on Rolling Shutter Mechanism (DFM) capable of synthesizing complex multi-banding scenarios. Second, we present a spatial-temporal continuous prior perception (CPP). Unlike traditional binary segmentation, this module is optimized via a Flicker-Aware Mean Squared Error (FA-MSE) to capture the luminance transitions. By zero-initializing an augmented input layer, our model preserves pre-trained generative priors as well as spatial-temporal prior perception. Extensive experiments demonstrate that VDFP significantly outperforms other methods, eliminating complex banding with high-fidelity spatial details and temporal consistency. Our dataset and code will be released at https://github.com/ZhiyiZZhou/VDFP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes VDFP for deflickering videos with flicker-banding from digital screen captures using smartphones. It first builds the DeViD dataset for real-world scenes. The method includes Degradation Field Modeling (DFM) based on rolling shutter to synthesize multi-banding, spatial-temporal continuous prior perception (CPP) optimized with Flicker-Aware MSE for luminance transitions, and zero-initialization to keep pre-trained priors. It claims extensive experiments show significant outperformance over other methods in removing banding with high fidelity and consistency.

Significance. This work tackles a common practical issue in video processing. The new dataset and the continuous prior approach could be impactful if properly validated. Credit is given for releasing the dataset and code, and for attempting to use pre-trained models efficiently.

major comments (2)

[Abstract] The central claim that VDFP 'significantly outperforms other methods' is not backed by any numerical results, ablations, or statistical analysis in the text. This makes it impossible to evaluate the strength of the contribution without the full experimental details.
[DFM and CPP sections] The assumption that DFM accurately synthesizes real-world multi-banding is not validated quantitatively against DeViD statistics (e.g., no comparison of periodicity or spatial characteristics). This is load-bearing as the training of CPP relies on these synthetic priors, potentially leading to domain shift in real applications.

minor comments (2)

[Overall] The manuscript would benefit from clearer definitions of terms like 'Flicker-banding Priors' and how they are incorporated.
[References] Ensure all related work on video deflickering and rolling shutter modeling is cited.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have addressed each major comment below and revised the manuscript accordingly to strengthen the presentation and validation of our contributions.

read point-by-point responses

Referee: [Abstract] The central claim that VDFP 'significantly outperforms other methods' is not backed by any numerical results, ablations, or statistical analysis in the text. This makes it impossible to evaluate the strength of the contribution without the full experimental details.

Authors: We appreciate this observation. The experimental section of the manuscript (Section 4) contains quantitative comparisons, ablation studies, and statistical analyses (including PSNR, SSIM, and user-study results) demonstrating the outperformance. To directly support the abstract claim, we have revised the abstract to include a brief summary of key quantitative improvements and explicit references to the detailed results and ablations in Section 4. revision: yes
Referee: [DFM and CPP sections] The assumption that DFM accurately synthesizes real-world multi-banding is not validated quantitatively against DeViD statistics (e.g., no comparison of periodicity or spatial characteristics). This is load-bearing as the training of CPP relies on these synthetic priors, potentially leading to domain shift in real applications.

Authors: We agree that quantitative validation of the DFM synthesis against DeViD is important given its role in training CPP. In the revised manuscript, we have added a new analysis subsection that compares the periodicity (via frequency spectra) and spatial characteristics (via banding pattern distributions and statistics) of DFM-synthesized degradations to real samples from the DeViD dataset. This addition directly addresses concerns about domain shift. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the method derivation

full rationale

The paper constructs the DeViD real-world dataset, introduces DFM to synthesize multi-banding degradations from a rolling-shutter physical model, defines CPP as a continuous prior-perception module trained with the custom FA-MSE loss, and preserves external pre-trained generative priors via zero-initialization of an input layer. These steps form an independent modeling and training pipeline whose outputs are evaluated on held-out real captures; no equation or claim reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation chain within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the fidelity of the rolling-shutter-based synthesis and the effectiveness of the new perception module; these are introduced without independent external validation in the abstract. Pre-trained generative priors are treated as given from prior literature.

axioms (1)

domain assumption Rolling shutter mechanism in cameras produces the observed banding patterns in screen captures
Invoked to justify the DFM synthesis of training data.

invented entities (1)

Degradation Field no independent evidence
purpose: To model and synthesize complex multi-banding scenarios from rolling shutter effects
New construct introduced for data generation in the absence of sufficient real paired data.

pith-pipeline@v0.9.0 · 5744 in / 1267 out tokens · 47115 ms · 2026-05-22T09:45:33.725550+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Degradation Field Modeling (DFM) based on the rolling shutter mechanism... kinematic spatiotemporal modeling... dual-layer banding fusion... smoothstep function
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Flicker-Aware Mean Squared Error (FA-MSE)... continuous artifact confidence map

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InCVPR, 2023

work page 2023
[2]

Dove: Efficient one-step diffusion model for real-world video super-resolution

Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang. Dove: Efficient one-step diffusion model for real-world video super-resolution. InNeurIPS, 2025

work page 2025
[3]

Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising

Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, and Haoqian Wang. Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising. In AAAI, 2025

work page 2025
[4]

Video demoireing with relation-based temporal consistency

Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, and Xiaojuan Qi. Video demoireing with relation-based temporal consistency. InCVPR, 2022

work page 2022
[5]

Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

work page 2022
[6]

Cogview2: faster and better text-to-image generation via hierarchical transformers

Ming Ding, Wendi Zheng, Wenyi Hong, and Jie Tang. Cogview2: faster and better text-to-image generation via hierarchical transformers. InNeurIPS, 2022

work page 2022
[7]

Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

Xuan Dong, Xiangyuan Sun, Xia Wang, Jian Song, Ya Li, and Weixin Li. Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

work page 2025
[8]

Woodhead Publishing, 2019

Daniel Durini.High performance silicon imaging: Fundamentals and applications of CMOS and CCD sensors. Woodhead Publishing, 2019

work page 2019
[9]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InCVPR, 2021

work page 2021
[10]

Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

Bernard Geffroy, Philippe Le Roy, and Christophe Prat. Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

work page 2006
[11]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InNeurIPS, 2014

work page 2014
[12]

Fhde2net: Full high definition demoireing network

Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Fhde2net: Full high definition demoireing network. InECCV, 2020

work page 2020
[13]

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. InNeurIPS, 2022

work page 2022
[14]

Alias-free generative adversarial networks

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InNeurIPS, 2021

work page 2021
[15]

A style-based generator architecture for generative adversarial networks.TPAMI, 2021

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks.TPAMI, 2021

work page 2021
[16]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InICCV, 2021

work page 2021
[17]

Generative adversarial networks

Moez Krichen. Generative adversarial networks. InICCCNT, 2023

work page 2023
[18]

Learning blind video temporal consistency

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. Learning blind video temporal consistency. InECCV, 2018

work page 2018
[19]

Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

Zhi Li, Christos Bampis, Julie Novak, Anne Aaron, Kyle Swanson, Anush Moorthy, and JD Cock. Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

work page 2018
[20]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024. 10

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2017

work page 2017
[22]

Meyers, editor.Encyclopedia of Physical Science and Technology

Robert A. Meyers, editor.Encyclopedia of Physical Science and Technology. Academic Press, third edition, 2001

work page 2001
[23]

No-reference image quality assessment in the spatial domain.TIP, 2012

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.TIP, 2012

work page 2012
[24]

Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan

B. Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan. U-net convolutional networks for real-time biomedical image segmentation and anomaly detection. In ICONSTEM, 2025

work page 2025
[25]

Fpanet: Frequency-based video demoireing using frame-level post alignment

Gyeongrok Oh, Sungjune Kim, Heon Gu, Sang Ho Yoon, Jinkyu Kim, and Sangpil Kim. Fpanet: Frequency-based video demoireing using frame-level post alignment. InNeural Networks, 2025

work page 2025
[26]

Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes

Lishen Qu, Zhihao Liu, Shihao Zhou, Yaqi Luo, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes. InNeurIPS, 2025

work page 2025
[27]

It takes two: A duet of periodicity and directionality for burst flicker removal

Lishen Qu, Shihao Zhou, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. It takes two: A duet of periodicity and directionality for burst flicker removal. InCVPR, 2026

work page 2026
[28]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, 2021

work page 2021
[29]

One-step diffusion for detail-rich and temporally consistent video super-resolution

Yujing Sun, Lingchen Sun, Shuaizheng Liu, Rongyuan Wu, Zhengqiang Zhang, and Lei Zhang. One-step diffusion for detail-rich and temporally consistent video super-resolution. InNeurlPS, 2025

work page 2025
[30]

Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

Jeahun Sung, Changhyun Roh, Chanho Eom, and Jihyong Oh. Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

work page 2025
[31]

Detail-revealing deep video super-resolution

Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. Detail-revealing deep video super-resolution. InICCV, 2017

work page 2017
[32]

Exploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InAAAI, 2023

work page 2023
[33]

Image quality assessment: from error visibility to structural similarity.TIP, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.TIP, 2004

work page 2004
[34]

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou Hou, Annan Wang, Wenxiu Sun Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InICCV, 2023

work page 2023
[35]

Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution

Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, and Ying Tai. Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution. InICCV, 2025

work page 2025
[36]

A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

work page arXiv 2024
[37]

Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, and Jiantao Zhou. Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

work page arXiv 2025
[38]

Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

Shuning Xu, Binbin Song, Xiangyu Chen, and Jiantao Zhou. Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

work page arXiv 2023
[39]

Dsdnet: Raw domain demoiréing via dual color-space synergy

Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Pengtao Jiang, Huanjing Yue, and Jingyu Yang. Dsdnet: Raw domain demoiréing via dual color-space synergy. InACM MM, 2025

work page 2025
[40]

Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations

Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Jiayi Ma. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. InICCV, 2019. 11

work page 2019
[41]

Towards efficient and scale-robust ultra-high-definition image demoiréing

Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, and Xiaojuan Qi. Towards efficient and scale-robust ultra-high-definition image demoiréing. InECCV, 2022

work page 2022
[42]

Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

Huanjing Yue, Yijia Cheng, Xin Liu, and Jingyu Yang. Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

work page arXiv 2023
[43]

Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, and Tilo Burghardt. Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

work page arXiv 2023
[44]

The unreason- able effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018

work page 2018
[45]

Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring

Mingchen Zhong, Xin Lu, Dong Li, Senyan Xu, Ruixuan Jiang, Xueyang Fu, and Baocai Yin. Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring. InAAAI, 2026

work page 2026
[46]

Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution

Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, and Chen Change Loy. Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution. InCVPR, 2024

work page 2024
[47]

Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025

Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, and Yulun Zhang. Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025. 12 A Technical Appendices and Supplementary Materials A.1 Mechanism of Flicker-banding Artifact Formation Flicker-banding is fundamentally a visual manifesta...

work page arXiv 2025

[1] [1]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InCVPR, 2023

work page 2023

[2] [2]

Dove: Efficient one-step diffusion model for real-world video super-resolution

Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang. Dove: Efficient one-step diffusion model for real-world video super-resolution. InNeurIPS, 2025

work page 2025

[3] [3]

Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising

Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, and Haoqian Wang. Spatiotem- poral blind-spot network with calibrated flow alignment for self-supervised video denoising. In AAAI, 2025

work page 2025

[4] [4]

Video demoireing with relation-based temporal consistency

Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, and Xiaojuan Qi. Video demoireing with relation-based temporal consistency. InCVPR, 2022

work page 2022

[5] [5]

Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.TPAMI, 2022

work page 2022

[6] [6]

Cogview2: faster and better text-to-image generation via hierarchical transformers

Ming Ding, Wendi Zheng, Wenyi Hong, and Jie Tang. Cogview2: faster and better text-to-image generation via hierarchical transformers. InNeurIPS, 2022

work page 2022

[7] [7]

Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

Xuan Dong, Xiangyuan Sun, Xia Wang, Jian Song, Ya Li, and Weixin Li. Video demoireing using focused-defocused dual-camera system.TPAMI, 2025

work page 2025

[8] [8]

Woodhead Publishing, 2019

Daniel Durini.High performance silicon imaging: Fundamentals and applications of CMOS and CCD sensors. Woodhead Publishing, 2019

work page 2019

[9] [9]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InCVPR, 2021

work page 2021

[10] [10]

Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

Bernard Geffroy, Philippe Le Roy, and Christophe Prat. Organic light-emitting diode (oled) technology: materials, devices and display technologies.Polymer international, 2006

work page 2006

[11] [11]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InNeurIPS, 2014

work page 2014

[12] [12]

Fhde2net: Full high definition demoireing network

Bin He, Ce Wang, Boxin Shi, and Ling-Yu Duan. Fhde2net: Full high definition demoireing network. InECCV, 2020

work page 2020

[13] [13]

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. InNeurIPS, 2022

work page 2022

[14] [14]

Alias-free generative adversarial networks

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InNeurIPS, 2021

work page 2021

[15] [15]

A style-based generator architecture for generative adversarial networks.TPAMI, 2021

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks.TPAMI, 2021

work page 2021

[16] [16]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InICCV, 2021

work page 2021

[17] [17]

Generative adversarial networks

Moez Krichen. Generative adversarial networks. InICCCNT, 2023

work page 2023

[18] [18]

Learning blind video temporal consistency

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. Learning blind video temporal consistency. InECCV, 2018

work page 2018

[19] [19]

Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

Zhi Li, Christos Bampis, Julie Novak, Anne Aaron, Kyle Swanson, Anush Moorthy, and JD Cock. Vmaf: The journey continues.Netflix Technology Blog, 25(1), 2018

work page 2018

[20] [20]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, and Lichao Sun. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024. 10

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2017

work page 2017

[22] [22]

Meyers, editor.Encyclopedia of Physical Science and Technology

Robert A. Meyers, editor.Encyclopedia of Physical Science and Technology. Academic Press, third edition, 2001

work page 2001

[23] [23]

No-reference image quality assessment in the spatial domain.TIP, 2012

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.TIP, 2012

work page 2012

[24] [24]

Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan

B. Murugesakumar, Aakash S, A Arunmanikandan, M Chithra, and M Dhasaranjan. U-net convolutional networks for real-time biomedical image segmentation and anomaly detection. In ICONSTEM, 2025

work page 2025

[25] [25]

Fpanet: Frequency-based video demoireing using frame-level post alignment

Gyeongrok Oh, Sungjune Kim, Heon Gu, Sang Ho Yoon, Jinkyu Kim, and Sangpil Kim. Fpanet: Frequency-based video demoireing using frame-level post alignment. InNeural Networks, 2025

work page 2025

[26] [26]

Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes

Lishen Qu, Zhihao Liu, Shihao Zhou, Yaqi Luo, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. Burstdeflicker: A benchmark dataset for flicker removal in dynamic scenes. InNeurIPS, 2025

work page 2025

[27] [27]

It takes two: A duet of periodicity and directionality for burst flicker removal

Lishen Qu, Shihao Zhou, Jie Liang, Hui Zeng, Lei Zhang, and Jufeng Yang. It takes two: A duet of periodicity and directionality for burst flicker removal. InCVPR, 2026

work page 2026

[28] [28]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, 2021

work page 2021

[29] [29]

One-step diffusion for detail-rich and temporally consistent video super-resolution

Yujing Sun, Lingchen Sun, Shuaizheng Liu, Rongyuan Wu, Zhengqiang Zhang, and Lei Zhang. One-step diffusion for detail-rich and temporally consistent video super-resolution. InNeurlPS, 2025

work page 2025

[30] [30]

Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

Jeahun Sung, Changhyun Roh, Chanho Eom, and Jihyong Oh. Mocha-former: Moiré- conditioned hybrid adaptive transformer for video demoiréing.Neurocomputing, 2025

work page 2025

[31] [31]

Detail-revealing deep video super-resolution

Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. Detail-revealing deep video super-resolution. InICCV, 2017

work page 2017

[32] [32]

Exploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InAAAI, 2023

work page 2023

[33] [33]

Image quality assessment: from error visibility to structural similarity.TIP, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.TIP, 2004

work page 2004

[34] [34]

Exploring video quality assessment on user generated contents from aesthetic and technical perspectives

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou Hou, Annan Wang, Wenxiu Sun Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. InICCV, 2023

work page 2023

[35] [35]

Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution

Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, and Ying Tai. Star: Spatial-temporal augmentation with text-to-video models for real-world video super-resolution. InICCV, 2025

work page 2025

[36] [36]

A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models.arXiv preprint arXiv:2310.10647, 2024

work page arXiv 2024

[37] [37]

Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, and Jiantao Zhou. Alignment- free raw video demoireing.arXiv preprint arXiv:2408.10679, 2025

work page arXiv 2025

[38] [38]

Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

Shuning Xu, Binbin Song, Xiangyu Chen, and Jiantao Zhou. Direction-aware video demoireing with temporal-guided bilateral learning.arXiv preprint arXiv:2308.13388, 2023

work page arXiv 2023

[39] [39]

Dsdnet: Raw domain demoiréing via dual color-space synergy

Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Pengtao Jiang, Huanjing Yue, and Jingyu Yang. Dsdnet: Raw domain demoiréing via dual color-space synergy. InACM MM, 2025

work page 2025

[40] [40]

Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations

Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, and Jiayi Ma. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. InICCV, 2019. 11

work page 2019

[41] [41]

Towards efficient and scale-robust ultra-high-definition image demoiréing

Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, and Xiaojuan Qi. Towards efficient and scale-robust ultra-high-definition image demoiréing. InECCV, 2022

work page 2022

[42] [42]

Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

Huanjing Yue, Yijia Cheng, Xin Liu, and Jingyu Yang. Recaptured raw screen image and video demoireing via channel and spatial modulations.arXiv preprint arXiv:2310.20332, 2023

work page arXiv 2023

[43] [43]

Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, and Tilo Burghardt. Video-swinunet: Spatio-temporal deep learning framework for vfss instance segmentation.arXiv preprint arXiv:2302.11325, 2023

work page arXiv 2023

[44] [44]

The unreason- able effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018

work page 2018

[45] [45]

Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring

Mingchen Zhong, Xin Lu, Dong Li, Senyan Xu, Ruixuan Jiang, Xueyang Fu, and Baocai Yin. Compevent: Complex-valued event-rgb fusion for low-light video enhancement and deblurring. InAAAI, 2026

work page 2026

[46] [46]

Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution

Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, and Chen Change Loy. Upscale-A- Video: Temporal-consistent diffusion model for real-world video super-resolution. InCVPR, 2024

work page 2024

[47] [47]

Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025

Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, and Yulun Zhang. Rifle: Removal of image flicker-banding via latent diffusion enhancement.arXiv preprint arXiv:2509.24644, 2025. 12 A Technical Appendices and Supplementary Materials A.1 Mechanism of Flicker-banding Artifact Formation Flicker-banding is fundamentally a visual manifesta...

work page arXiv 2025