pith. machine review for the scientific record. sign in

arxiv: 2604.05500 · v2 · submitted 2026-04-07 · 💻 cs.CV

Recognition: no theorem link

CLIP-Guided Data Augmentation for Night-Time Image Dehazing

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords nighttime image dehazingCLIP-guided data screeningdomain-aligned augmentationtwo-stage trainingimage restorationNTIRE challengeinference-time enhancement
0
0 comments X

The pith

CLIP similarity screening builds domain-aligned training data for nighttime dehazing, allowing staged adaptation of a standard network without architectural redesign.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that screening external images with a pre-trained CLIP encoder by visual similarity produces training data sufficiently close to scarce nighttime haze targets, reducing domain drift and training instability. Nighttime dehazing combines haze scattering with low illumination, non-uniform lighting, and light interference, which makes naive addition of outside data counterproductive. The solution constructs aligned data, trains NAFNet first on the target domain then on wider patterns, and applies inference-time steps including TLC, eight-fold self-ensemble, and weighted snapshot fusion. If correct, the work shows that data curation guided by similarity can make existing restoration networks practical for this challenging setting rather than requiring new model designs.

Core claim

The central claim is that a pre-trained CLIP visual encoder can screen candidate external samples by similarity to the target nighttime images, thereby constructing training data that stays close to the desired degradation distribution. NAFNet is then adapted in two stages—first locking onto the target domain, then expanding to broader patterns—while inference combines TLC, x8 self-ensemble, and weighted snapshot fusion to raise stability. This yields a unified pipeline of domain-aligned data construction, stage-wise training, and test-time enhancement that solves the task without complex network changes.

What carries the argument

CLIP-guided screening of external samples to form domain-aligned training data, followed by two-stage adaptation of NAFNet and inference-time stabilization techniques.

If this is right

  • Domain-aligned data reduces drift and instability when target samples are few and degradations are complex.
  • Two-stage training first secures target-domain performance then widens generalization to varied haze patterns.
  • Inference enhancements such as self-ensemble and snapshot fusion raise output consistency without retraining.
  • The overall pipeline demonstrates that data construction and training strategy can substitute for network redesign in restoration tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same screening idea could be tested on other low-light or weather-degraded restoration problems where labeled target data remains limited.
  • If CLIP embeddings reliably encode degradation type, the method offers a route to reduce reliance on fully synthetic data generation pipelines.
  • A controlled ablation replacing CLIP selection with random external sampling would directly quantify the contribution of the similarity step.

Load-bearing premise

Screening external samples by CLIP similarity will produce training data close enough to the target nighttime haze distribution that it improves rather than harms adaptation.

What would settle it

Train identical NAFNet models on the same backbone using randomly chosen external samples versus CLIP-screened samples, then measure whether the CLIP version produces measurably higher restoration quality on held-out nighttime test images.

Figures

Figures reproduced from arXiv: 2604.05500 by Gengjia Chang, Shuhong Liu, Weijun Yuan, Xining Ge, Xuyang Li.

Figure 1
Figure 1. Figure 1: Pipeline overall pipeline of our proposed solution, showing CLIP-guided data augmentation, stage-wise NAFNet training, and [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison on representative NHM-20 samples. From left to right: Input, NAFNet-100k, Weighted Ensemble, and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Nighttime image dehazing faces a more complex degradation pattern than its daytime counterpart, as haze scattering couples with low illumination, non-uniform lighting, and strong light interference. Under limited supervision, this complexity aggravates domain drift and training instability, since target-domain samples are scarce while naively introducing external data may weaken adaptation due to distribution mismatch. This paper presents our solution to the NTIRE 2026 Night Time Image Dehazing Challenge, built as a unified framework that integrates domain-aligned data construction, stage-wise training, and inference-time enhancement. Specifically, a pre-trained CLIP visual encoder screens candidate external samples by similarity to construct training data closer to the target domain. NAFNet is then trained in two stages, first adapting to the target domain and then expanding to broader degradation patterns. At inference time, TLC, x8 self-ensemble, and weighted snapshot fusion are combined to improve output stability. Rather than relying on complex network redesign, the proposed framework offers a practical and effective pipeline for nighttime image dehazing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a practical pipeline for nighttime image dehazing that addresses complex degradations (haze coupled with low illumination and non-uniform lighting) under limited target-domain data. It uses a pre-trained CLIP visual encoder to screen external samples by similarity, constructing training data closer to the NTIRE target domain. NAFNet is then trained in two stages—first adapting to the target domain, then broadening to wider patterns—followed by inference enhancements combining test-time local contrast (TLC), 8x self-ensemble, and weighted snapshot fusion. The central claim is that this data-aligned, stage-wise approach yields effective dehazing without requiring network redesign.

Significance. If the results hold, the work would demonstrate a pragmatic alternative to architectural innovation in image restoration by leveraging foundation-model embeddings for domain-aligned data curation and staged adaptation. This could be valuable for other low-data restoration tasks where target samples are scarce and naive external data risks mismatch, potentially influencing challenge submissions and applied CV pipelines.

major comments (1)
  1. [CLIP-guided screening subsection] CLIP-guided screening subsection: The claim that CLIP cosine similarity on visual embeddings constructs training data sufficiently close to the target domain to avoid distribution mismatch is load-bearing for the first-stage adaptation but unsupported. CLIP was pretrained on semantic image-text pairs, not physics-based degradation statistics; nothing guarantees that high-similarity samples match in haze density, contrast, or lighting histograms. Without an ablation (e.g., CLIP-selected vs. random external data) showing reduced drift or improved adaptation metrics, the subsequent broadening stage and inference tricks cannot be shown to compensate.
minor comments (2)
  1. [Abstract] Abstract: The summary of the pipeline is clear but would be strengthened by a brief mention of key quantitative outcomes (e.g., PSNR/SSIM gains on the challenge test set) to allow readers to gauge effectiveness immediately.
  2. [Inference-time enhancement paragraph] Inference-time enhancement paragraph: Specify the exact weighting scheme for snapshot fusion and the number of snapshots used; this detail is needed for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical value of our pipeline for nighttime dehazing under limited target data. We address the single major comment point-by-point below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [CLIP-guided screening subsection] CLIP-guided screening subsection: The claim that CLIP cosine similarity on visual embeddings constructs training data sufficiently close to the target domain to avoid distribution mismatch is load-bearing for the first-stage adaptation but unsupported. CLIP was pretrained on semantic image-text pairs, not physics-based degradation statistics; nothing guarantees that high-similarity samples match in haze density, contrast, or lighting histograms. Without an ablation (e.g., CLIP-selected vs. random external data) showing reduced drift or improved adaptation metrics, the subsequent broadening stage and inference tricks cannot be shown to compensate.

    Authors: We agree that CLIP was pretrained on semantic image-text pairs and therefore does not explicitly encode physics-based quantities such as haze optical depth or illumination histograms. Our design choice rests on the empirical observation that, for nighttime scenes, high cosine similarity in CLIP space tends to retrieve images sharing comparable low-level appearance (content, contrast, and lighting distribution), which we confirmed by qualitative review of the selected samples. The two-stage training protocol is explicitly intended to first anchor the model to the target domain and then broaden its coverage. Nevertheless, we accept that the current manuscript lacks a direct quantitative ablation isolating the contribution of CLIP screening versus random external data. In the revised version we will add this ablation, reporting (i) distribution-shift metrics (e.g., Fréchet distance on VGG features) between the selected training set and the target domain, and (ii) the resulting PSNR/SSIM gains on the NTIRE validation set. This will allow readers to assess whether the screening step measurably reduces drift before the broadening and inference stages are applied. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline relies on external pre-trained models without self-referential reduction

full rationale

The paper describes a data-construction pipeline that screens external samples via a pre-trained CLIP encoder, followed by two-stage NAFNet training and standard inference tricks. No equations, fitted parameters, or self-citations are presented that reduce any claimed prediction or uniqueness result to the inputs by construction. The framework is self-contained against external benchmarks (CLIP, NAFNet) and does not invoke author-specific theorems or ansatzes that would create circularity. The reader's noted assumption about CLIP similarity is a correctness concern, not a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that CLIP embeddings capture domain similarity useful for dehazing data selection and that staged training plus ensembles will stabilize outputs under complex nighttime degradations.

axioms (2)
  • domain assumption Pre-trained CLIP visual encoder can screen candidate external samples by similarity to construct training data closer to the target domain
    Invoked to mitigate distribution mismatch when adding external data.
  • domain assumption Two-stage training (target adaptation then broader degradation) plus inference enhancements will improve stability without network redesign
    Central to the practical pipeline claim.

pith-pipeline@v0.9.0 · 5487 in / 1273 out tokens · 32073 ms · 2026-05-10T18:29:26.054019+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis

    cs.CV 2026-04 unverdicted novelty 5.0

    Dehaze-then-Splat uses per-frame generative dehazing followed by physics-regularized 3D Gaussian Splatting to achieve 20.98 dB PSNR and 0.683 SSIM on the Akikaze scene, a 1.5 dB gain over baseline by mitigating cross-...

  2. 3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models

    cs.CV 2026-04 unverdicted novelty 5.0

    A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.

  3. Training-Free Model Ensemble for Single-Image Super-Resolution via Strong-Branch Compensation

    cs.CV 2026-04 unverdicted novelty 4.0

    A dual-branch training-free ensemble fuses a hybrid attention network with a Mamba-based model via weighted combination to enhance super-resolution PSNR on DIV2K x4.

  4. Dual-Branch Remote Sensing Infrared Image Super-Resolution

    cs.CV 2026-04 unverdicted novelty 4.0

    Dual-branch fusion of HAT-L and MambaIRv2-L with eight-way ensemble and equal-weight averaging outperforms single branches on PSNR, SSIM, and challenge score for infrared super-resolution.

  5. SmokeGS-R: Physics-Guided Pseudo-Clean 3DGS for Real-World Multi-View Smoke Restoration

    cs.CV 2026-04 conditional novelty 4.0

    SmokeGS-R uses refined dark channel prior for pseudo-clean supervision to train 3DGS geometry, followed by ensemble-based appearance harmonization, achieving PSNR 15.21 and outperforming baselines on smoke restoration...

  6. Beyond Model Design: Data-Centric Training and Self-Ensemble for Gaussian Color Image Denoising

    cs.CV 2026-04 unverdicted novelty 3.0

    Expanding training data diversity, adopting two-stage optimization, and applying geometric self-ensemble raises Restormer performance on Gaussian color denoising at sigma=50 by 3.366 dB PSNR on the NTIRE 2026 validation set.

  7. NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

    cs.CV 2026-04 unverdicted novelty 2.0

    The NTIRE 2026 challenge reports measurable progress in 3D reconstruction pipelines that handle real-world low-light and smoke degradation via the RealX3D benchmark.

Reference graph

Works this paper leans on

78 extracted references · 16 canonical work pages · cited by 7 Pith papers · 12 internal anchors

  1. [1]

    Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images

    Codruta O Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images. In2019 IEEE interna- tional conference on image processing (ICIP), pages 1014–

  2. [2]

    Nh-haze: An image dehazing benchmark with non- homogeneous hazy and haze-free images

    Codruta O Ancuti, Cosmin Ancuti, and Radu Timo- fte. Nh-haze: An image dehazing benchmark with non- homogeneous hazy and haze-free images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 444–445, 2020. 3

  3. [3]

    O-haze: a dehazing bench- mark with real hazy and haze-free outdoor images

    Codruta O Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer. O-haze: a dehazing bench- mark with real hazy and haze-free outdoor images. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition workshops, pages 754–762, 2018. 3

  4. [4]

    Ntire 2020 challenge on non- homogeneous dehazing

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, and Radu Timofte. Ntire 2020 challenge on non- homogeneous dehazing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 490–491, 2020

  5. [5]

    Ntire 2021 nonhomogeneous dehazing challenge report

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, and Radu Timofte. Ntire 2021 nonhomogeneous dehazing challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 627–646, 2021

  6. [6]

    Ntire 2024 dense and non-homogeneous dehazing challenge report

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Yidi Liu, Xingbo Wang, Yurui Zhu, Gege Shi, Xin Lu, Xueyang Fu, et al. Ntire 2024 dense and non-homogeneous dehazing challenge report. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6453–6468, 2024. 3

  7. [7]

    NTIRE 2026 Nighttime Image Dehazing Challenge Report

    Radu Ancuti, Alexandru Brateanu, Florin Vasluianu, Raul Balmez, Ciprian Orhei, Codruta Ancuti, Radu Timofte, Cos- min Ancuti, et al. NTIRE 2026 Nighttime Image Dehazing Challenge Report. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026. 2

  8. [8]

    Non-local image dehazing

    Dana Berman, Shai Avidan, et al. Non-local image dehazing. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1674–1682, 2016. 2

  9. [9]

    Retinexformer: One-stage retinex- based transformer for low-light image enhancement

    Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Tim- ofte, and Yulun Zhang. Retinexformer: One-stage retinex- based transformer for low-light image enhancement. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 12504–12513, 2023. 2

  10. [10]

    GenSmoke-GS: A Multi-Stage Method for Novel View Synthesis from Smoke-Degraded Images Using a Generative Model

    Qida Cao, Xinyuan Hu, Changyue Shi, Jiajun Ding, Zhou Yu, and Jun Yu. Gensmoke-gs: A multi-stage method for novel view synthesis from smoke-degraded images using a generative model.arXiv preprint arXiv:2604.03039, 2026. 1

  11. [11]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 3

  12. [12]

    Beyond Model Design: Data-Centric Training and Self-Ensemble for Gaussian Color Image Denoising

    Gengjia Chang, Xining Ge, Weijun Yuan, Zhan Li, Qiurong Song, Luen Zhu, and Shuhong Liu. Beyond model design: Data-centric training and self-ensemble for gaussian color image denoising.arXiv preprint arXiv:2604.11468, 2026. 3

  13. [13]

    Training-Free Model Ensemble for Single-Image Super-Resolution via Strong-Branch Compensation

    Gengjia Chang, Xining Ge, Weijun Yuan, Zhan Li, Qiurong Song, Luen Zhu, and Shuhong Liu. Training-free model en- semble for single-image super-resolution via strong-branch compensation.arXiv preprint arXiv:2604.11564, 2026. 3

  14. [14]

    Simple baselines for image restoration

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InEuropean confer- ence on computer vision, pages 17–33. Springer, 2022. 1, 3, 4

  15. [15]

    Hinet: Half instance normalization network for image restoration

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- peng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 182– 192, 2021. 1

  16. [16]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on ma- chine learning, pages 1597–1607. PmLR, 2020. 3

  17. [17]

    Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis

    Yuchao Chen and Hanqing Wang. Dehaze-then-splat: Gen- erative dehazing with physics-informed 3d gaussian splat- ting for smoke-free novel view synthesis.arXiv preprint arXiv:2604.13589, 2026. 1

  18. [18]

    Dea-net: Single image dehazing based on detail-enhanced convolution and content-guided attention.IEEE transactions on image pro- cessing, 33:1002–1015, 2024

    Zixuan Chen, Zewei He, and Zhe-Ming Lu. Dea-net: Single image dehazing based on detail-enhanced convolution and content-guided attention.IEEE transactions on image pro- cessing, 33:1002–1015, 2024. 2

  19. [19]

    Improving image restoration by revisiting global information aggregation

    Xiaojie Chu, Liangyu Chen, Chengpeng Chen, and Xin Lu. Improving image restoration by revisiting global information aggregation. InEuropean Conference on Computer Vision, pages 53–71. Springer, 2022. 1, 3, 4

  20. [20]

    Unifying color and lightness correction with view-adaptive curve ad- justment for robust 3d novel view synthesis.arXiv preprint arXiv:2602.18322, 2026

    Ziteng Cui, Shuhong Liu, Xiaoyu Dong, Xuangeng Chu, Lin Gu, Ming-Hsuan Yang, and Tatsuya Harada. Unifying color and lightness correction with view-adaptive curve ad- justment for robust 3d novel view synthesis.arXiv preprint arXiv:2602.18322, 2026. 1

  21. [21]

    Multi-scale boosted de- hazing network with dense feature fusion

    Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming-Hsuan Yang. Multi-scale boosted de- hazing network with dense feature fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2157–2167, 2020. 2

  22. [22]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3

  23. [23]

    Single image dehazing.ACM transactions on graphics (TOG), 27(3):1–9, 2008

    Raanan Fattal. Single image dehazing.ACM transactions on graphics (TOG), 27(3):1–9, 2008. 2

  24. [24]

    SmokeGS-R: Physics-Guided Pseudo-Clean 3DGS for Real-World Multi-View Smoke Restoration

    Xueming Fu and Lixia Han. Smokegs-r: Physics-guided pseudo-clean 3dgs for real-world multi-view smoke restora- tion.arXiv preprint arXiv:2604.05301, 2026. 1

  25. [25]

    Dual-Branch Remote Sensing Infrared Image Super-Resolution

    Xining Ge, Gengjia Chang, Weijun Yuan, Zhan Li, Zhanglu Chen, Boyang Yao, Yihang Chen, Yifan Deng, and Shuhong Liu. Dual-branch remote sensing infrared image super- resolution.arXiv preprint arXiv:2604.10112, 2026. 3 7

  26. [26]

    Image dehazing transformer with transmission-aware 3d position embedding

    Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022. 2

  27. [27]

    Zero-reference deep curve estimation for low-light image enhancement

    Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, and Runmin Cong. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 1780–1789, 2020. 2

  28. [28]

    Reliability-aware staged low-light gaussian splatting.ResearchGate preprint, 2026

    Haojie Guo and Ke Xian. Reliability-aware staged low-light gaussian splatting.ResearchGate preprint, 2026. 1

  29. [29]

    Onerestore: A universal restoration frame- work for composite degradation

    Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, and Shengfeng He. Onerestore: A universal restoration frame- work for composite degradation. InEuropean conference on computer vision, pages 255–272. Springer, 2024. 1

  30. [30]

    Momentum contrast for unsupervised visual rep- resentation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 3

  31. [31]

    Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

  32. [32]

    Snapshot Ensembles: Train 1, get M for free

    Gao Huang, Yixuan Li, Geoff Pleiss, Zhuang Liu, John E Hopcroft, and Kilian Q Weinberger. Snapshot ensembles: Train 1, get m for free.arXiv preprint arXiv:1704.00109,

  33. [33]

    Scaling up visual and vision-language representa- tion learning with noisy text supervision

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,

  34. [34]

    Enlightengan: Deep light enhancement without paired supervision.IEEE transactions on image processing, 30:2340–2349, 2021

    Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang. Enlightengan: Deep light enhancement without paired supervision.IEEE transactions on image processing, 30:2340–2349, 2021. 2

  35. [35]

    Benchmarking single- image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single- image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018. 1

  36. [36]

    All-in-one image restoration for unknown corruption

    Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17452– 17462, 2022. 1

  37. [37]

    Semi- supervised image dehazing.IEEE Transactions on Image Processing, 29:2766–2779, 2019

    Lerenhan Li, Yunlong Dong, Wenqi Ren, Jinshan Pan, Changxin Gao, Nong Sang, and Ming-Hsuan Yang. Semi- supervised image dehazing.IEEE Transactions on Image Processing, 29:2766–2779, 2019. 2

  38. [38]

    Densesplat: Densifying gaussian splatting slam with neural radiance prior.IEEE Transactions on Visualization & Computer Graphics, (01):1–14, 2025

    Mingrui Li, Shuhong Liu, Tianchen Deng, and Hongyu Wang. Densesplat: Densifying gaussian splatting slam with neural radiance prior.IEEE Transactions on Visualization & Computer Graphics, (01):1–14, 2025. 1

  39. [39]

    Sgs-slam: Semantic gaussian splatting for neural dense slam

    Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. Sgs-slam: Semantic gaussian splatting for neural dense slam. InEuro- pean Conference on Computer Vision, pages 163–179, 2025. 1

  40. [40]

    Nighttime haze removal with glow and multiple light colors

    Yu Li, Robby T Tan, and Michael S Brown. Nighttime haze removal with glow and multiple light colors. InProceed- ings of the IEEE international conference on computer vi- sion, pages 226–234, 2015. 2

  41. [41]

    Towards robust event-guided low-light image enhancement: a large-scale real-world event-image dataset and novel approach

    Guoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, and Lin Wang. Towards robust event-guided low-light image enhancement: a large-scale real-world event-image dataset and novel approach. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 23–33, 2024. 2

  42. [42]

    Swinir: Image restoration us- ing swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1833–1844,

  43. [43]

    Nighthaze: Nighttime image dehazing via self-prior learning

    Beibei Lin, Yeying Jin, Yan Wending, Wei Ye, Yuan Yuan, and Robby T Tan. Nighthaze: Nighttime image dehazing via self-prior learning. InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 39, pages 5209–5217,

  44. [44]

    Towards multi-domain single image dehazing via test-time training

    Huan Liu, Zijun Wu, Liangyan Li, Sadaf Salehkalaibar, Jun Chen, and Keyan Wang. Towards multi-domain single image dehazing via test-time training. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5831–5840, 2022. 2, 3

  45. [46]

    NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

    Shuhong Liu, Chenyu Bao, Ziteng Cui, Xuangeng Chu, Bin Ren, Lin Gu, Xiang Chen, Mingrui Li, Long Ma, Marcos V . Conde, Radu Timofte, et al. Ntire 2026 3D restoration and reconstruction in adverse conditions: RealX3D challenge re- sults.arXiv preprint arXiv:2604.04135, 2026. 1

  46. [47]

    Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and recon- struction.arXiv preprint arXiv:2512.23437, 2025

    Shuhong Liu, Chenyu Bao, Ziteng Cui, Yun Liu, Xuangeng Chu, Lin Gu, Marcos V Conde, Ryo Umagami, Tomohiro Hashimoto, Zijian Hu, et al. Realx3d: A physically-degraded 3d benchmark for multi-view visual restoration and recon- struction.arXiv preprint arXiv:2512.23437, 2026. 1

  47. [48]

    Deraings: Gaussian splatting for enhanced scene reconstruction in rainy environments.Proceedings of the AAAI Conference on Artificial Intelligence, 39(5):5558– 5566, 2025

    Shuhong Liu, Xiang Chen, Hongming Chen, Quanfeng Xu, and Mingrui Li. Deraings: Gaussian splatting for enhanced scene reconstruction in rainy environments.Proceedings of the AAAI Conference on Artificial Intelligence, 39(5):5558– 5566, 2025. 1

  48. [49]

    Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis.IEEE Transactions on Automation Science and En- gineering, 22:17034–17049, 2025

    Shuhong Liu, Tianchen Deng, Heng Zhou, Liuzhuozheng Li, Hongyu Wang, Danwei Wang, and Mingrui Li. Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis.IEEE Transactions on Automation Science and En- gineering, 22:17034–17049, 2025. 1

  49. [50]

    De- noising the deep sky: Physics-based ccd noise formation for astronomical imaging.arXiv preprint arXiv:2601.23276,

    Shuhong Liu, Xining Ge, Ziying Gu, Lin Gu, Ziteng Cui, Xuangeng Chu, Jun Liu, Dong Li, and Tatsuya Harada. De- noising the deep sky: Physics-based ccd noise formation 8 for astronomical imaging.arXiv preprint arXiv:2601.23276,

  50. [51]

    I2-nerf: Learning neural radiance fields un- der physically-grounded media interactions

    Shuhong Liu, Lin Gu, Ziteng Cui, Xuangeng Chu, and Tat- suya Harada. I2-nerf: Learning neural radiance fields un- der physically-grounded media interactions. InAdvances in Neural Information Processing Systems, 2025. 1

  51. [52]

    Grid- dehazenet: Attention-based multi-scale network for image dehazing

    Xiaohong Liu, Yongrui Ma, Zhihao Shi, and Jun Chen. Grid- dehazenet: Attention-based multi-scale network for image dehazing. InProceedings of the IEEE/CVF international conference on computer vision, pages 7314–7323, 2019. 2

  52. [53]

    ELoG-GS: Dual-Branch Gaussian Splatting with Luminance-Guided Enhancement for Extreme Low-light 3D Reconstruction

    Yuhao Liu, Dingju Wang, and Ziyang Zheng. Elog-gs: Dual- branch gaussian splatting with luminance-guided enhance- ment for extreme low-light 3d reconstruction.arXiv preprint arXiv:2604.12592, 2026. 1

  53. [54]

    Nighttime image dehazing based on variational decom- position model

    Yun Liu, Zhongsheng Yan, Aimin Wu, Tian Ye, and Yuche Li. Nighttime image dehazing based on variational decom- position model. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 640–649,

  54. [55]

    Efficient image dehazing with boundary constraint and contextual regularization

    Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xi- ang, and Chunhong Pan. Efficient image dehazing with boundary constraint and contextual regularization. InPro- ceedings of the IEEE international conference on computer vision, pages 617–624, 2013. 2

  55. [56]

    Interactive (de) weathering of an im- age using physical models

    Srinivasa Narasimhan. Interactive (de) weathering of an im- age using physical models. 2000. 2

  56. [57]

    Vision and the atmosphere.International journal of computer vision, 48(3):233–254, 2002

    Srinivasa G Narasimhan and Shree K Nayar. Vision and the atmosphere.International journal of computer vision, 48(3):233–254, 2002. 2

  57. [58]

    Ffa-net: Feature fusion attention network for single image dehazing

    Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. Ffa-net: Feature fusion attention network for single image dehazing. InProceedings of the AAAI con- ference on artificial intelligence, volume 34, pages 11908– 11915, 2020. 2

  58. [59]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 3

  59. [60]

    The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report

    Bin Ren, Hang Guo, Yan Shu, Jiaqi Ma, Ziteng Cui, Shuhong Liu, Guofeng Mei, Lei Sun, Zongwei Wu, Fahad Shahbaz Khan, Salman Khan, Radu Timofte, Yawei Li, et al. The eleventh NTIRE 2026 efficient super-resolution challenge re- port.arXiv preprint arXiv:2604.03198, 2026. 1

  60. [61]

    Specularity factorization for low-light enhancement

    Saurabh Saini and PJ Narayanan. Specularity factorization for low-light enhancement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1–12, 2024. 2

  61. [62]

    Non homogeneous realistic single im- age dehazing

    Lithesh Shetty et al. Non homogeneous realistic single im- age dehazing. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pages 548–555,

  62. [63]

    Zero-ig: Zero-shot illumination-guided joint denoising and adaptive enhancement for low-light images

    Yiqi Shi, Duo Liu, Liguo Zhang, Ye Tian, Xuezhi Xia, and Xiaojing Fu. Zero-ig: Zero-shot illumination-guided joint denoising and adaptive enhancement for low-light images. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 3015–3024, 2024. 2

  63. [64]

    Data efficient single image dehazing via adversarial auto-augmentation and ex- tended atmospheric scattering model

    Pranjay Shyam and HyunJin Yoo. Data efficient single image dehazing via adversarial auto-augmentation and ex- tended atmospheric scattering model. InProceedings of the IEEE/CVF international conference on computer vision, pages 227–237, 2023. 2

  64. [65]

    Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 2

  65. [66]

    Visibility in bad weather from a single im- age

    Robby T Tan. Visibility in bad weather from a single im- age. In2008 IEEE conference on computer vision and pat- tern recognition, pages 1–8. IEEE, 2008. 2

  66. [67]

    Maxim: Multi-axis mlp for image processing

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxim: Multi-axis mlp for image processing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5769–5780, 2022. 1

  67. [68]

    Uformer: A general u-shaped transformer for image restoration

    Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022. 1

  68. [69]

    Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement

    Wenhui Wu, Jian Weng, Pingping Zhang, Xu Wang, Wen- han Yang, and Jianmin Jiang. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5901–5910, 2022. 2

  69. [70]

    Learning to restore low-light images via decomposition-and- enhancement

    Ke Xu, Xin Yang, Baocai Yin, and Rynson WH Lau. Learning to restore low-light images via decomposition-and- enhancement. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2281– 2290, 2020. 2

  70. [71]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739,

  71. [72]

    Learning enriched features for real image restoration and enhancement

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. InEuropean conference on computer vi- sion, pages 492–511. Springer, 2020. 1

  72. [73]

    Multi-stage progressive image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021. 1

  73. [74]

    Depth informa- tion assisted collaborative mutual promotion network for sin- gle image dehazing

    Yafei Zhang, Shen Zhou, and Huafeng Li. Depth informa- tion assisted collaborative mutual promotion network for sin- gle image dehazing. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2846–2855, 2024. 2

  74. [75]

    Kindling the darkness: A practical low-light image enhancer

    Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo. Kindling the darkness: A practical low-light image enhancer. InPro- ceedings of the 27th ACM international conference on mul- timedia, pages 1632–1640, 2019. 2

  75. [76]

    3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models

    Xinye Zheng, Fei Wang, Yiqi Nie, Kun Li, Junjie Chen, Ji- aqi Zhao, Yanyan Wei, and Zhiliang Wu. 3d smoke scene re- 9 construction guided by vision priors from multimodal large language models.arXiv preprint arXiv:2604.05687, 2026. 1

  76. [77]

    Mod- slam: Monocular dense mapping for unbounded 3d scene reconstruction.IEEE Robotics and Automation Letters, 10(1):484–491, 2024

    Heng Zhou, Zhetao Guo, Yuxiang Ren, Shuhong Liu, Lechen Zhang, Kaidi Zhang, and Mingrui Li. Mod- slam: Monocular dense mapping for unbounded 3d scene reconstruction.IEEE Robotics and Automation Letters, 10(1):484–491, 2024. 1

  77. [78]

    A fast single image haze removal algorithm using color attenuation prior

    Qingsong Zhu, Jiaming Mai, and Ling Shao. A fast single image haze removal algorithm using color attenuation prior. IEEE transactions on image processing, 24(11):3522–3533,

  78. [79]

    Naka-GS: A Bionics-inspired Dual-Branch Naka Correction and Progressive Point Pruning for Low-Light 3DGS

    Runyu Zhu, SiXun Dong, Zhiqiang Zhang, Qingxia Ye, and Zhihua Xu. Naka-gs: A bionics-inspired dual-branch naka correction and progressive point pruning for low-light 3dgs. arXiv preprint arXiv:2604.11142, 2026. 1 10