BVI-RLV: A Fully Registered Dataset for Low-Light Video Enhancement
Pith reviewed 2026-05-25 08:48 UTC · model grok-4.3
The pith
BVI-RLV supplies over 30k sub-pixel registered low-light to normal-light video frame pairs from 40 scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BVI-RLV comprises over 30k paired frames from 40 diverse scenes captured under two low-light conditions and aligned to normal-light ground truth. Sub-pixel registration holds for 99.24 percent of the full-HD data through motorized dolly motion combined with image-based refinement, while covering varied motion types and realistic temporal noise. Registration proves essential for supervised learning, delivering up to 5.85 dB PSNR gains over unregistered training, and models trained on the dataset outperform those from prior collections in cross-dataset tests, including real-world outdoor scenes.
What carries the argument
Motorized dolly movement combined with image-based refinement to produce sub-pixel accurate alignment between low-light and normal-light video frames.
If this is right
- Training enhancement networks on the registered pairs raises PSNR by as much as 5.85 dB relative to unregistered versions of the same data.
- Models trained on BVI-RLV exceed the cross-dataset performance of models trained on existing low-light collections.
- The dataset supports training that generalizes to real-world outdoor low-light video.
- Baseline results for CNN, Transformer, Mamba, and diffusion models become available for direct comparison.
Where Pith is reading between the lines
- Precise frame alignment may enable models to exploit temporal correlations more effectively than misalignment permits.
- The capture method could be adapted to other video tasks that require exact low-light to reference pairing.
- Public release of the paired sequences may allow researchers to test whether registration quality correlates with gains in temporal consistency metrics.
- Superior outdoor performance hints that the dataset captures noise statistics closer to uncontrolled environments than ND-filter approaches.
Load-bearing premise
The dolly motion and refinement process yields alignments that stay sub-pixel accurate and artifact-free across all scenes without introducing systematic biases into downstream model training.
What would settle it
A direct comparison of model performance when trained on BVI-RLV pairs versus the same scenes captured with handheld or static-camera methods that lack the dolly alignment step.
Figures
read the original abstract
Low-light videos often exhibit spatiotemporally incoherent noise, compromising visibility and degrading performance in computer vision applications. A major challenge for enhancing such content using deep learning lies in the scarcity of pixel-aligned, high-quality training data. We introduce BVI-RLV, a fully registered low-light video dataset comprising over 30k paired frames from 40 diverse scenes under two low-light conditions, each aligned with normal-light ground truth. Unlike existing datasets that rely on neutral density (ND) filters or suffer from misalignment issues, BVI-RLV achieves sub-pixel registration for 99.24% of data at full HD resolution across dynamic motion scenarios using a motorized dolly and image-based refinement. The dataset covers a wide range of motion types and realistic temporal noise. We also provide baseline implementations using four representative architectures: Convolutional Neural Network (CNN), Transformer, State Space Model (Mamba), and Diffusion Model (DM). Experiments demonstrate that registration is crucial for supervised learning, yielding up to 5.85 dB PSNR improvement compared to unregistered training. Models trained on BVI-RLV outperform those trained on existing datasets in cross-dataset evaluations, achieving superior performance even in real-world outdoor scenes. Our dataset is publicly available at https://doi.org/10.21227/mzny-8c77.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents BVI-RLV, a new dataset of >30k paired low-light/normal-light video frames across 40 scenes captured with a motorized dolly plus image-based refinement, claiming 99.24% sub-pixel registration at full HD. It supplies baseline results for CNN, Transformer, Mamba, and diffusion models, reports up to 5.85 dB PSNR gain from using registered versus unregistered pairs, and shows superior cross-dataset generalization including on real outdoor scenes.
Significance. If the registration accuracy and lack of systematic bias hold, the dataset would fill a documented gap in aligned low-light video data and enable more reliable supervised training; the public release and multi-architecture baselines are concrete strengths that would support reproducibility and further work in the area.
major comments (2)
- [Abstract / registration section] Abstract and registration-method description: the headline 99.24% sub-pixel registration figure is presented without an independent validation metric (e.g., residual error against fiducial markers, multi-view consistency, or external reference alignment); if the percentage is derived solely from internal convergence of the image-based refinement step, it cannot rule out consistent sub-pixel biases that would affect downstream supervised training and the reported 5.85 dB gain.
- [Experiments / ablation studies] Experiments section (cross-dataset and registration-ablation results): the claim that registration is “crucial” and yields up to 5.85 dB improvement requires explicit confirmation that the unregistered training baseline used identical data volume, augmentation, optimizer schedule, and convergence criteria; without those controls the PSNR delta cannot be attributed solely to alignment quality.
minor comments (1)
- [Dataset description] The motion-type taxonomy and noise-characterization details would benefit from an explicit table or figure summarizing the 40 scenes.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract / registration section] Abstract and registration-method description: the headline 99.24% sub-pixel registration figure is presented without an independent validation metric (e.g., residual error against fiducial markers, multi-view consistency, or external reference alignment); if the percentage is derived solely from internal convergence of the image-based refinement step, it cannot rule out consistent sub-pixel biases that would affect downstream supervised training and the reported 5.85 dB gain.
Authors: The 99.24% figure is computed from the residual displacement after the motorized dolly plus image-based refinement, with sub-pixel defined as <1 pixel error via feature matching. We agree this is an internal metric and does not provide fully independent validation (e.g., fiducial markers). In revision we will explicitly detail the computation, add discussion of possible systematic biases, and include multi-frame consistency checks from static scenes as supporting evidence. revision: yes
-
Referee: [Experiments / ablation studies] Experiments section (cross-dataset and registration-ablation results): the claim that registration is “crucial” and yields up to 5.85 dB improvement requires explicit confirmation that the unregistered training baseline used identical data volume, augmentation, optimizer schedule, and convergence criteria; without those controls the PSNR delta cannot be attributed solely to alignment quality.
Authors: The unregistered baseline used identical data volume, augmentations, optimizer, schedule, and convergence criteria; the sole difference was pair alignment. We will revise the experiments section to state these controls explicitly. revision: yes
Circularity Check
No circularity: empirical dataset contribution with no derivation chain
full rationale
The paper presents an empirical dataset and baseline experiments rather than any mathematical derivation or fitted-parameter prediction. The sub-pixel registration claim is a reported measurement from the data collection process (motorized dolly + refinement), not a quantity derived from or fitted to the downstream PSNR results. Cross-dataset evaluations are standard empirical comparisons with no self-referential reduction. No equations, ansatzes, or uniqueness theorems are invoked that collapse to the paper's own inputs. This is a self-contained dataset paper; the reader's circularity score of 1.0 is consistent with the absence of load-bearing circular steps.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Towards a General-Purpose Zero-Shot Synthetic Low-Light Image and Video Pipeline
A self-supervised Degradation Estimation Network estimates parameters for physics-informed noise distributions to generate realistic synthetic low-light data, showing gains on noise replication, enhancement, and detec...
Reference graph
Works this paper leans on
-
[1]
Chen Chen, Qifeng Chen, Minh N. Do, and Vladlen Koltun. Seeing motion in the dark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
work page 2019
-
[2]
Seeing dynamic scene in the dark: High-quality video dataset with mechatronic alignment
Ruixing Wang, Xiaogang Xu, Chi-Wing Fu, Jiangbo Lu, Bei Yu, and Jiaya Jia. Seeing dynamic scene in the dark: High-quality video dataset with mechatronic alignment. In ICCV, 2021
work page 2021
-
[3]
Dancing in the dark: A benchmark towards general low-light video enhancement
Huiyuan Fu, Wenkai Zheng, Xicong Wang, Jiaxuan Wang, Heng Zhang, and Huadong Ma. Dancing in the dark: A benchmark towards general low-light video enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[4]
Self-supervised training for blind multi-frame video denoising
Valery Dewil, Jeremy Anger, Axel Davy, Thibaud Ehret, Gabriele Facciolo, and Pablo Arias. Self-supervised training for blind multi-frame video denoising. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2724–2734, January 2021
work page 2021
-
[5]
Self-supervised low-light image enhancement using discrepant untrained network priors
Jinxiu Liang, Yong Xu, Yuhui Quan, Boxin Shi, and Hui Ji. Self-supervised low-light image enhancement using discrepant untrained network priors. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7332–7345, 2022. 9
work page 2022
-
[6]
Anantrasirichai and David Bull
N. Anantrasirichai and David Bull. Contextual colorization and denoising for low-light ultra high resolution sequences. In ICIP proc., pages 1614–1618, 2021
work page 2021
-
[7]
A topological loss function for image denoising on a new BVI-lowlight dataset
Alexandra Malyugina, Nantheera Anantrasirichai, and David Bull. A topological loss function for image denoising on a new BVI-lowlight dataset. Signal Processing, 211, 2023
work page 2023
-
[8]
Richter, Laura Waller, and Vladlen Koltun
Kristina Monakhova, Stephan R. Richter, Laura Waller, and Vladlen Koltun. Dancing under the stars: video denoising in starlight. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16220–16230, 2022
work page 2022
-
[9]
BDD100K: A diverse driving dataset for heterogeneous multitask learning
Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. BDD100K: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[10]
Learning to see moving objects in the dark
Haiyang Jiang and Yinqiang Zheng. Learning to see moving objects in the dark. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7323–7332, 2019
work page 2019
-
[11]
Supervised raw video denoising with a benchmark dataset on dynamic scenes
Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, and Jingyu Yang. Supervised raw video denoising with a benchmark dataset on dynamic scenes. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2298–2307, 2020
work page 2020
-
[12]
An uncompressed benchmark image dataset for colour imaging
Gerald Schaefer. An uncompressed benchmark image dataset for colour imaging. In 2010 IEEE International Conference on Image Processing, pages 3537–3540, 2010
work page 2010
-
[13]
Benchmarking denoising algorithms with real photographs
Tobias Plötz and Stefan Roth. Benchmarking denoising algorithms with real photographs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2750–2759, 2017
work page 2017
-
[14]
A. Abdelhamed, S. Lin, and M.-S. Brown. A high-quality denoising dataset for smartphone cameras. In CVPR proc., pages 1692–1700, 2018
work page 2018
-
[15]
Low-light image and video enhancement using deep learning: A sur- vey
Chongyi Li, Chunle Guo, Linghao Han, Jun Jiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. Low-light image and video enhancement using deep learning: A sur- vey. IEEE Transactions on Pattern Analysis and Machine Intelligence , 44(12):9396–9416, 2022
work page 2022
-
[16]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Inter- vention (MICCAI), 2015
work page 2015
-
[17]
Revisiting temporal alignment for video restoration
Kun Zhou, Wenbo Li, Liying Lu, Xiaoguang Han, and Jiangbo Lu. Revisiting temporal alignment for video restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
work page 2022
-
[18]
Enhancing low light videos by exploring high sensitivity camera noise
Wei Wang, Xin Chen, Cheng Yang, Xiang Li, Xuemei Hu, and Tao Yue. Enhancing low light videos by exploring high sensitivity camera noise. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4110–4118, 2019
work page 2019
-
[19]
J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei. Deformable convolutional networks. In ICCV, pages 764–773, Oct 2017
work page 2017
-
[20]
Low-light video enhancement with synthetic event guidance
Lin Liu, Junfeng An, Jianzhuang Liu, Shanxin Yuan, Xiangyu Chen, Wengang Zhou, Houqiang Li, Yan Feng Wang, and Qi Tian. Low-light video enhancement with synthetic event guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2):1692–1700, Jun. 2023
work page 2023
-
[21]
Low light video enhancement using synthetic data produced with an intermediate domain mapping
Danai Triantafyllidou, Sean Moran, Steven McDonagh, Sarah Parisot, and Gregory Slabaugh. Low light video enhancement using synthetic data produced with an intermediate domain mapping. In European Conference on Computer Vision, pages 103–119. Springer, 2020
work page 2020
-
[22]
Anantrasirichai, Alin Achim, and David Bull
N. Anantrasirichai, Alin Achim, and David Bull. Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusion. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2895–2899, 2018
work page 2018
-
[23]
Image registration by local histogram matching
Dinggang Shen. Image registration by local histogram matching. Pattern Recognition, 40(4):1161–1172, 2007
work page 2007
-
[24]
Sarvaiya, Suprava Patnaik, and Salman Bombaywala
J.N. Sarvaiya, Suprava Patnaik, and Salman Bombaywala. Image registration by template matching using normalized cross-correlation. In 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies, pages 819–822, 2009. 10
work page 2009
-
[25]
Noise flow: Noise modeling with conditional normalizing flows
Abdelrahman Abdelhamed, Marcus Brubaker, and Michael Brown. Noise flow: Noise modeling with conditional normalizing flows. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3165–3173, 2019
work page 2019
-
[26]
A spatio-temporal aligned sunet model for low-light video enhancement
Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, and David Bull. A spatio-temporal aligned sunet model for low-light video enhancement. In Submitting to IEEE International Conference on Image Processing, 2024
work page 2024
-
[27]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. ICLR, 2021
work page 2021
-
[28]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
VMamba: Visual State Space Model
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Low-light image enhancement with wavelet-based diffusion models
Hai Jiang, Ao Luo, Haoqiang Fan, Songchen Han, and Shuaicheng Liu. Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG), 42(6):1–14, 2023
work page 2023
-
[31]
Chan, Ke Yu, Chao Dong, and Chen Change Loy
Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. EDVR: Video restoration with enhanced deformable convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019
work page 2019
-
[32]
L. Sendur and I.W. Selesnick. Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Transactions on Signal Processing, 50(11):2744–2756, 2002
work page 2002
-
[33]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1833–1844, 2021
work page 2021
- [35]
-
[36]
High-resolution image synthesis and semantic manipulation with conditional gans
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2018
work page 2018
-
[37]
Real image denoising with feature attention
Saeed Anwar and Nick Barnes. Real image denoising with feature attention. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3155–3164, 2019. 11
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.