Recognition: 2 theorem links
· Lean TheoremCANDLE: Illumination-Invariant Semantic Priors for Color Ambient Lighting Normalization
Pith reviewed 2026-05-13 19:52 UTC · model grok-4.3
The pith
DINOv3 self-supervised features stay consistent across colored and ambient lighting, enabling accurate recovery of intrinsic object colors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DINOv3 features remain highly consistent between colored-light inputs and ambient-lit ground truth; this property is exploited as illumination-robust semantic priors inside an encoder-decoder network that uses DINO Omni-layer Guidance to inject multi-layer features adaptively and color-frequency refinement modules (BFACG + SFFB) to suppress chromatic artifacts.
What carries the argument
DINO Omni-layer Guidance (D.O.G.) that adaptively injects multi-layer DINOv3 features into successive encoder stages, paired with a color-frequency refinement design (BFACG + SFFB) in the decoder.
If this is right
- The network achieves a 1.22 dB PSNR gain over the previous best method on the CL3AN dataset.
- CANDLE places third overall on the NTIRE 2026 ALN Color Lighting Challenge and second in fidelity on the White Lighting track with lowest FID.
- The same design generalizes across both strongly chromatic and luminance-dominant illumination conditions.
- Multi-layer DINO injection plus frequency-aware refinement reduces both highlight saturation and material-dependent reflectance errors.
Where Pith is reading between the lines
- The same DINO consistency property could be tested on video frames to enforce temporally stable color normalization.
- If the priors prove robust to other degradations, the approach might extend to joint lighting and shadow correction tasks.
- Replacing DINOv3 with later self-supervised models could be measured to check whether newer feature sets further reduce residual color bias.
- The method's reliance on a frozen backbone suggests it may run faster on edge devices than methods that retrain large illumination estimators.
Load-bearing premise
The consistency seen in DINOv3 features between colored and ambient conditions will carry over to the chosen network architecture and refinement modules without creating new color errors or losing fine detail.
What would settle it
A controlled test set of scenes with previously unseen material types and extreme colored highlights where the CANDLE output shows larger chromatic deviation from ground-truth ambient images than a strong baseline that does not use DINO features.
Figures
read the original abstract
Color ambient lighting normalization under multi-colored illumination is challenging due to severe chromatic shifts, highlight saturation, and material-dependent reflectance. Existing geometric and low-level priors are insufficient for recovering object-intrinsic color when illumination-induced chromatic bias dominates. We observe that DINOv3's self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth, motivating their use as illumination-robust semantic priors. We propose CANDLE (Color Ambient Normalization with DINO Layer Enhancement), which introduces DINO Omni-layer Guidance (D.O.G.) to adaptively inject multi-layer DINOv3 features into successive encoder stages, and a color-frequency refinement design (BFACG + SFFB) to suppress decoder-side chromatic collapse and detail contamination. Experiments on CL3AN show a +1.22 dB PSNR gain over the strongest prior method. CANDLE achieves 3rd place on the NTIRE 2026 ALN Color Lighting Challenge and 2nd place in fidelity on the White Lighting track with the lowest FID, confirming strong generalization across both chromatic and luminance-dominant illumination conditions. Code is available at https://github.com/ron941/CANDLE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CANDLE for color ambient lighting normalization under multi-colored illumination. It observes that DINOv3 self-supervised features are consistent across colored-light inputs and ambient ground truth, then introduces DINO Omni-layer Guidance (D.O.G.) to inject multi-layer DINOv3 features into encoder stages, plus color-frequency refinement blocks (BFACG + SFFB) in the decoder. On the CL3AN dataset it reports a +1.22 dB PSNR gain over prior methods, 3rd place on the NTIRE 2026 ALN Color Lighting Challenge, and 2nd place in fidelity on the White Lighting track with lowest FID.
Significance. If the central claim holds, the work shows that off-the-shelf self-supervised ViT features can serve as illumination-robust semantic priors for low-level normalization tasks without task-specific fine-tuning. The challenge rankings provide external validation of generalization across chromatic and luminance-dominant conditions, and the parameter-free nature of the DINO prior is a clear strength.
major comments (3)
- [Experiments] Experiments section: the abstract reports a +1.22 dB PSNR gain and challenge rankings, yet no error bars, multiple-run statistics, or ablation tables are referenced; without these the robustness of the improvement cannot be assessed and the central claim remains under-supported.
- [Section 3.1] Motivation and Section 3.1: the claim that DINOv3 features remain highly consistent between colored-light inputs and ambient ground truth is stated qualitatively but never quantified (no cosine similarity, CKA, or layer-wise distance metrics are provided), so the motivation for D.O.G. injection rests on an unverified assumption.
- [Ablation studies] Ablation studies: no experiment removes D.O.G. while retaining B FACG + SFFB (or vice versa); the end-to-end PSNR gain therefore does not isolate whether the DINO priors contribute meaningfully or whether the color-frequency refinement blocks alone suffice.
minor comments (2)
- [Abstract] Abstract: acronyms D.O.G., B FACG and SFFB appear without parenthetical expansions on first use.
- [Figure 1] Figure 1: the architecture diagram would benefit from explicit arrows or labels indicating the precise encoder stages where D.O.G. features are injected.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that the experimental section requires stronger statistical support and more complete ablations to substantiate the claims. We will revise the manuscript to address all three points by adding the requested analyses and metrics.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the abstract reports a +1.22 dB PSNR gain and challenge rankings, yet no error bars, multiple-run statistics, or ablation tables are referenced; without these the robustness of the improvement cannot be assessed and the central claim remains under-supported.
Authors: We agree that reporting error bars and multiple-run statistics is necessary to demonstrate robustness. In the revised manuscript we will add mean PSNR and standard deviation computed over five independent training runs for the CL3AN results, include error bars on the main comparison table, and clarify that the challenge rankings reflect a single fixed submission. revision: yes
-
Referee: [Section 3.1] Motivation and Section 3.1: the claim that DINOv3 features remain highly consistent between colored-light inputs and ambient ground truth is stated qualitatively but never quantified (no cosine similarity, CKA, or layer-wise distance metrics are provided), so the motivation for D.O.G. injection rests on an unverified assumption.
Authors: We acknowledge that the consistency claim was presented only qualitatively. We will add a new quantitative analysis subsection in the revised Section 3.1 that reports layer-wise cosine similarity and CKA scores between DINOv3 features of colored-light inputs and their ambient ground-truth counterparts, thereby providing empirical grounding for the D.O.G. design. revision: yes
-
Referee: [Ablation studies] Ablation studies: no experiment removes D.O.G. while retaining B FACG + SFFB (or vice versa); the end-to-end PSNR gain therefore does not isolate whether the DINO priors contribute meaningfully or whether the color-frequency refinement blocks alone suffice.
Authors: We agree that the current ablations do not fully isolate the contribution of each component. We will add a new ablation table in the revised manuscript that includes a variant with D.O.G. removed while retaining B FACG and SFFB, reporting the resulting PSNR to quantify the incremental benefit of the DINO priors. revision: yes
Circularity Check
No circularity: external DINOv3 prior and independent architectural modules
full rationale
The derivation begins with an empirical observation of DINOv3 feature consistency (external model, not fitted on target lighting data) and proceeds to architectural choices (D.O.G. injection, B FACG + SFFB refinement). No equations reduce a prediction to a fitted input by construction, no self-citations load-bear the central claim, and no ansatz or uniqueness theorem is smuggled in. The reported PSNR gains and challenge rankings are end-to-end empirical results on CL3AN and NTIRE tracks, not self-referential definitions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption DINOv3 self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We observe that DINOv3's self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth, motivating their use as illumination-robust semantic priors.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DINO Omni-layer Guidance (D.O.G.) ... color-frequency refinement design (BFACG + SFFB)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Thomas Bachlechner, Bodhisattva Paul Majumdar, Huanru Henry Mao, Garrison W. Cottrell, and Julian McAuley. ReZero is all you need: Fast convergence at large depth. In UAI, 2021. 4
work page 2021
-
[2]
Retinexformer: One-stage retinex- based transformer for low-light image enhancement
Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Tim- ofte, and Yulun Zhang. Retinexformer: One-stage retinex- based transformer for low-light image enhancement. In ICCV, 2023. 3 and 6
work page 2023
-
[3]
Simple baselines for image restoration
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. arXiv preprint arXiv:2204.04676, 2022. 1, 3, 5, 6, 7, and 8
-
[4]
Frequency-aware feature fusion for dense image prediction
Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, and Gao Huang. Frequency-aware feature fusion for dense image prediction. TPAMI, 2024. 3 and 5
work page 2024
-
[5]
WWE-UIE: A wavelet & white balance efficient network for underwater image enhancement
Ching-Heng Cheng, Jen-Wei Lee, Chia-Ming Lee, and Chih- Chung Hsu. WWE-UIE: A wavelet & white balance efficient network for underwater image enhancement. In W ACV ,
-
[6]
Detecting moving objects, ghosts, and shad- ows in video streams
Rita Cucchiara, Costantino Grana, Massimo Piccardi, and Andrea Prati. Detecting moving objects, ghosts, and shad- ows in video streams. TPAMI, 2003. 3
work page 2003
-
[7]
Selective frequency network for image restoration
Yuning Cui, Yi Tao, Zhenshan Bing, Wenqi Ren, Xinwei Gao, Xiaochun Cao, Kai Huang, and Alois Knoll. Selective frequency network for image restoration. In ICLR , 2023. 3 and 6
work page 2023
-
[8]
Mambair: A simple baseline for im- age restoration with state-space model
Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for im- age restoration with state-space model. In ECCV . Springer,
-
[9]
Shadowformer: global context helps shadow removal
Lanqing Guo, Siyu Huang, Ding Liu, Hao Cheng, and Bihan Wen. Shadowformer: global context helps shadow removal. In AAAI, 2023. 3
work page 2023
-
[10]
Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal
Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, and Bihan Wen. Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal. In CVPR, 2023. 3
work page 2023
-
[11]
Zur theorie der orthogonalen funktionensys- teme
Alfred Haar. Zur theorie der orthogonalen funktionensys- teme. Mathematische Annalen, 1910. 5
work page 1910
-
[12]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR ,
-
[13]
GANs trained by a two time-scale update rule converge to a local Nash equi- librium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equi- librium. In NeurIPS, 2017. 6
work page 2017
-
[14]
Des3: Adaptive attention-driven self and soft shadow removal using vit similarity
Yeying Jin, Wei Ye, Wenhan Yang, Yuan Yuan, and Robby T Tan. Des3: Adaptive attention-driven self and soft shadow removal using vit similarity. In AAAI, 2024. 3
work page 2024
-
[15]
Hinet: Deep image hiding by invertible network
Junpeng Jing, Xin Deng, Mai Xu, Jianyi Wang, and Zhenyu Guan. Hinet: Deep image hiding by invertible network. In ICCV, 2021. 3 and 6
work page 2021
-
[16]
The retinex theory of color vision
Edwin H Land. The retinex theory of color vision. Scientific American, 1977. 3
work page 1977
-
[17]
Shadow removal via shadow image decomposition
Hieu Le and Dimitris Samaras. Shadow removal via shadow image decomposition. In ICCV, 2019. 3
work page 2019
-
[18]
Phasr: Generalized im- age shadow removal with physically aligned priors, 2026
Chia-Ming Lee, Yu-Fan Lin, Yu-Jou Hsiao, Jing-Hui Jung, Yu-Lun Liu, and Chih-Chung Hsu. Phasr: Generalized im- age shadow removal with physically aligned priors, 2026. 3
work page 2026
-
[19]
Reflexsplit: Sin- gle image reflection separation via layer fusion-separation
Chia-Ming Lee, Yu-Fan Lin, Jin-Hui Jiang, Yu-Jou Hsiao, Chih-Chung Hsu, and Yu-Lun Liu. Reflexsplit: Sin- gle image reflection separation via layer fusion-separation. arXiv preprint arXiv:2601.17468, 2026. 3 and 5
-
[20]
Densesr: Image shadow removal as dense prediction
Yu-Fan Lin, Chia-Ming Lee, and Chih-Chung Hsu. Densesr: Image shadow removal as dense prediction. In ACM MM ,
-
[21]
Regional atten- tion for shadow removal
Hengxing Liu, Mingjia Li, and Xiaojie Guo. Regional atten- tion for shadow removal. In ACM MM, 2024. 3
work page 2024
-
[22]
Multi-level wavelet-CNN for image restoration
Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo. Multi-level wavelet-CNN for image restoration. In CVPRW, 2018. 3 and 5
work page 2018
-
[23]
Latent feature-guided diffusion models for shadow removal
Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, and Vishal M Patel. Latent feature-guided diffusion models for shadow removal. In W ACV, 2024. 3
work page 2024
-
[24]
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, et al. Dinov3: Learning robust visual features without supervision. arXiv preprint arXiv:2508.10104, 2025. 1, 2, 3, 4, and 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Promptir: Prompting for all-in-one image restoration
Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. Promptir: Prompting for all-in-one image restoration. In NeurIPS, 2023. 3 and 4
work page 2023
-
[26]
Deshadownet: A multi-context embedding deep network for shadow removal
Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, and Rynson WH Lau. Deshadownet: A multi-context embedding deep network for shadow removal. In CVPR ,
-
[27]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In ICML. PMLR, 2021. 1, 2, and 8
work page 2021
-
[28]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 2015. 3
work page 2015
-
[29]
U- Net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. In MICCAI, 2015. 5
work page 2015
-
[30]
Removing shadows from images using color and near-infrared
Neda Salamati, Arthur Germain, and S Siisstrunk. Removing shadows from images using color and near-infrared. In ICIP. IEEE, 2011. 3
work page 2011
-
[31]
David Serrano-Lozano, Francisco A. Molina-Bakhos, Danna Xue, Yixiong Yang, Maria Pilligua, Ramon Baldrich, Maria Vanrell, and Javier Vazquez-Corral. Promptnorm: Image geometry guides ambient light normalization. In CVPRW ,
-
[32]
1, 2, 3, 4, 5, 6, and 7
-
[33]
The shadow meets the mask: Pyramid-based shadow removal
Yael Shor and Dani Lischinski. The shadow meets the mask: Pyramid-based shadow removal. In Computer Graphics Forum. Wiley Online Library, 2008. 3
work page 2008
-
[34]
Towards image ambi- ent lighting normalization
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, and Radu Timofte. Towards image ambi- ent lighting normalization. In ECCV . Springer, 2024. 1, 2, 3, 5, 6, and 7
work page 2024
-
[35]
After the party: Navigating the mapping from color to ambient lighting
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, and Radu Timofte. After the party: Navigating the mapping from color to ambient lighting. In ICCV , 2025. 1, 2, 3, 5, 6, and 7
work page 2025
-
[36]
The second ambient lighting normalization challenge at NTIRE 2026
Florin-Alexandru Vasluianu, Tim Seizinger, Jeffrey Chen, Zhuyun Zhou, Zongwei Wu, Radu Timofte, et al. The second ambient lighting normalization challenge at NTIRE 2026. In CVPRW, 2026. 7 and 8
work page 2026
-
[37]
Jifeng Wang, Xiang Li, and Jian Yang. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In CVPR, 2018. 1
work page 2018
-
[38]
Image quality assessment: from error visibility to structural similarity
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 2004. 5 and 6
work page 2004
-
[39]
Uformer: A gen- eral u-shaped transformer for image restoration
Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A gen- eral u-shaped transformer for image restoration. In CVPR ,
-
[40]
Homoformer: Homogenized trans- former for image shadow removal
Jie Xiao, Xueyang Fu, Yurui Zhu, Dong Li, Jie Huang, Kai Zhu, and Zheng-Jun Zha. Homoformer: Homogenized trans- former for image shadow removal. In CVPR, 2024. 3
work page 2024
-
[41]
Omnisr: Shadow removal under direct and indirect lighting
Jiamin Xu, Zelong Li, Yuxin Zheng, Chenyu Huang, Renshu Gu, Weiwei Xu, and Gang Xu. Omnisr: Shadow removal under direct and indirect lighting. In AAAI, 2025. 3
work page 2025
-
[42]
Detail-preserving latent diffusion for stable shadow removal
Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, and Gang Xu. Detail-preserving latent diffusion for stable shadow removal. In CVPR, 2025. 3
work page 2025
-
[43]
Restormer: Efficient transformer for high-resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022. 1, 3, 5, 6, and 7
work page 2022
-
[44]
Shadow re- mover: Image shadow removal based on illumination recov- ering optimization
Ling Zhang, Qing Zhang, and Chunxia Xiao. Shadow re- mover: Image shadow removal based on illumination recov- ering optimization. TIP, 2015. 3
work page 2015
-
[45]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018. 6
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.