arxiv: 2604.02785 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

CANDLE: Illumination-Invariant Semantic Priors for Color Ambient Lighting Normalization

Rong-Lin Jian , Ting-Yao Chen , Yu-Fan Lin , Chia-Ming Lee , Fu-En Yang , Yu-Chiang Frank Wang , Chih-Chung Hsu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords illumination normalizationcolor ambient lightingDINOv3 featuressemantic priorsimage restorationself-supervised learningchromatic shift correction

0 comments

The pith

DINOv3 self-supervised features stay consistent across colored and ambient lighting, enabling accurate recovery of intrinsic object colors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that DINOv3 features extracted from images under multi-colored lights remain highly similar to those from the same scenes under neutral ambient light. This consistency supplies semantic guidance that is largely free of illumination-induced color bias, which existing geometric or low-level priors cannot provide when chromatic shifts dominate. The authors build CANDLE around this observation by injecting the features at multiple encoder layers and adding decoder-side filters to prevent color collapse and detail loss. Experiments report a 1.22 dB PSNR improvement on the CL3AN benchmark and competitive ranking in the NTIRE 2026 lighting normalization challenge.

Core claim

DINOv3 features remain highly consistent between colored-light inputs and ambient-lit ground truth; this property is exploited as illumination-robust semantic priors inside an encoder-decoder network that uses DINO Omni-layer Guidance to inject multi-layer features adaptively and color-frequency refinement modules (BFACG + SFFB) to suppress chromatic artifacts.

What carries the argument

DINO Omni-layer Guidance (D.O.G.) that adaptively injects multi-layer DINOv3 features into successive encoder stages, paired with a color-frequency refinement design (BFACG + SFFB) in the decoder.

If this is right

The network achieves a 1.22 dB PSNR gain over the previous best method on the CL3AN dataset.
CANDLE places third overall on the NTIRE 2026 ALN Color Lighting Challenge and second in fidelity on the White Lighting track with lowest FID.
The same design generalizes across both strongly chromatic and luminance-dominant illumination conditions.
Multi-layer DINO injection plus frequency-aware refinement reduces both highlight saturation and material-dependent reflectance errors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same DINO consistency property could be tested on video frames to enforce temporally stable color normalization.
If the priors prove robust to other degradations, the approach might extend to joint lighting and shadow correction tasks.
Replacing DINOv3 with later self-supervised models could be measured to check whether newer feature sets further reduce residual color bias.
The method's reliance on a frozen backbone suggests it may run faster on edge devices than methods that retrain large illumination estimators.

Load-bearing premise

The consistency seen in DINOv3 features between colored and ambient conditions will carry over to the chosen network architecture and refinement modules without creating new color errors or losing fine detail.

What would settle it

A controlled test set of scenes with previously unseen material types and extreme colored highlights where the CANDLE output shows larger chromatic deviation from ground-truth ambient images than a strong baseline that does not use DINO features.

Figures

Figures reproduced from arXiv: 2604.02785 by Chia-Ming Lee, Chih-Chung Hsu, Fu-En Yang, Rong-Lin Jian, Ting-Yao Chen, Yu-Chiang Frank Wang, Yu-Fan Lin.

**Figure 1.** Figure 1: DINOv3 [24] features remain highly consistent under colored illumination. Despite severe appearance shifts between the colored-light input and the ambient-lit GT, DINOv3 PCA visualizations remain highly consistent across domains. Patch-wise cosine similarity maps (brighter = stronger consistency) show that DINOv3 preserves substantially higher similarity both on average (P) and in the hardest regions (W), … view at source ↗

**Figure 2.** Figure 2: Quantitative representation consistency over 10 input/GT pairs. DINOv3 [24] achieves the highest average patchwise cosine similarity and the strongest robustness on the hardest 10% of regions (W), consistently outperforming CLIP [27] and ResNet-50 [12]. property stems from DINOv3’s self-supervised training objective, which encourages representations invariant to appearance variation while preserving ob… view at source ↗

**Figure 3.** Figure 3: Overview of CANDLE. Multi-layer features from a frozen DINOv3 [24] ViT-L/16 are adaptively selected and fused via PSF and injected into successive encoder stages through DRFB (D.O.G., Sec. 3.1), replacing surface-normal guidance with illuminationrobust semantic priors. At the decoder, BFACG decouples structural and chromatic restoration under edge-aware modulation, while SFFB suppresses illumination-corru… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison with state-of-the-art methods on CL3AN [34]. General restoration methods [3, 42] exhibit washedout colors and incomplete correction under strong chromatic shifts. ALN-specific methods [31, 33] partially recover global color balance but remain inconsistent across materials, as frequency and geometric priors are insufficient when chromaticity shifts dominate. CANDLE recovers object-in… view at source ↗

**Figure 5.** Figure 5: Training dynamics of ablation variants. (a) D.O.G. sustains a peak gain of +3.68 dB over the unguided baseline, with the gap maintained across all training epochs, confirming a structural rather than incidental improvement. (b) Query-based fusion converges more stably than input-concat or value-fusion variants in later epochs [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Color ambient lighting normalization under multi-colored illumination is challenging due to severe chromatic shifts, highlight saturation, and material-dependent reflectance. Existing geometric and low-level priors are insufficient for recovering object-intrinsic color when illumination-induced chromatic bias dominates. We observe that DINOv3's self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth, motivating their use as illumination-robust semantic priors. We propose CANDLE (Color Ambient Normalization with DINO Layer Enhancement), which introduces DINO Omni-layer Guidance (D.O.G.) to adaptively inject multi-layer DINOv3 features into successive encoder stages, and a color-frequency refinement design (BFACG + SFFB) to suppress decoder-side chromatic collapse and detail contamination. Experiments on CL3AN show a +1.22 dB PSNR gain over the strongest prior method. CANDLE achieves 3rd place on the NTIRE 2026 ALN Color Lighting Challenge and 2nd place in fidelity on the White Lighting track with the lowest FID, confirming strong generalization across both chromatic and luminance-dominant illumination conditions. Code is available at https://github.com/ron941/CANDLE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CANDLE shows a workable way to use DINOv3 features for colored lighting normalization plus some decoder fixes, but the results do not separate what the priors actually contribute.

read the letter

The main point is that the authors notice DINOv3 features stay fairly stable when the input has colored lighting versus normal ambient light. They build CANDLE around that by injecting multi-layer DINO features into the encoder stages with their D.O.G. schedule and then add BFACG and SFFB blocks in the decoder to handle chromatic issues and detail loss. On the CL3AN set this gives a 1.22 dB PSNR gain over prior work, and the method placed third overall in the NTIRE 2026 color lighting challenge with strong fidelity numbers on the white-light track as well. Code is released, which is helpful for anyone who wants to try it.

Referee Report

3 major / 2 minor

Summary. The paper proposes CANDLE for color ambient lighting normalization under multi-colored illumination. It observes that DINOv3 self-supervised features are consistent across colored-light inputs and ambient ground truth, then introduces DINO Omni-layer Guidance (D.O.G.) to inject multi-layer DINOv3 features into encoder stages, plus color-frequency refinement blocks (BFACG + SFFB) in the decoder. On the CL3AN dataset it reports a +1.22 dB PSNR gain over prior methods, 3rd place on the NTIRE 2026 ALN Color Lighting Challenge, and 2nd place in fidelity on the White Lighting track with lowest FID.

Significance. If the central claim holds, the work shows that off-the-shelf self-supervised ViT features can serve as illumination-robust semantic priors for low-level normalization tasks without task-specific fine-tuning. The challenge rankings provide external validation of generalization across chromatic and luminance-dominant conditions, and the parameter-free nature of the DINO prior is a clear strength.

major comments (3)

[Experiments] Experiments section: the abstract reports a +1.22 dB PSNR gain and challenge rankings, yet no error bars, multiple-run statistics, or ablation tables are referenced; without these the robustness of the improvement cannot be assessed and the central claim remains under-supported.
[Section 3.1] Motivation and Section 3.1: the claim that DINOv3 features remain highly consistent between colored-light inputs and ambient ground truth is stated qualitatively but never quantified (no cosine similarity, CKA, or layer-wise distance metrics are provided), so the motivation for D.O.G. injection rests on an unverified assumption.
[Ablation studies] Ablation studies: no experiment removes D.O.G. while retaining B FACG + SFFB (or vice versa); the end-to-end PSNR gain therefore does not isolate whether the DINO priors contribute meaningfully or whether the color-frequency refinement blocks alone suffice.

minor comments (2)

[Abstract] Abstract: acronyms D.O.G., B FACG and SFFB appear without parenthetical expansions on first use.
[Figure 1] Figure 1: the architecture diagram would benefit from explicit arrows or labels indicating the precise encoder stages where D.O.G. features are injected.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the experimental section requires stronger statistical support and more complete ablations to substantiate the claims. We will revise the manuscript to address all three points by adding the requested analyses and metrics.

read point-by-point responses

Referee: [Experiments] Experiments section: the abstract reports a +1.22 dB PSNR gain and challenge rankings, yet no error bars, multiple-run statistics, or ablation tables are referenced; without these the robustness of the improvement cannot be assessed and the central claim remains under-supported.

Authors: We agree that reporting error bars and multiple-run statistics is necessary to demonstrate robustness. In the revised manuscript we will add mean PSNR and standard deviation computed over five independent training runs for the CL3AN results, include error bars on the main comparison table, and clarify that the challenge rankings reflect a single fixed submission. revision: yes
Referee: [Section 3.1] Motivation and Section 3.1: the claim that DINOv3 features remain highly consistent between colored-light inputs and ambient ground truth is stated qualitatively but never quantified (no cosine similarity, CKA, or layer-wise distance metrics are provided), so the motivation for D.O.G. injection rests on an unverified assumption.

Authors: We acknowledge that the consistency claim was presented only qualitatively. We will add a new quantitative analysis subsection in the revised Section 3.1 that reports layer-wise cosine similarity and CKA scores between DINOv3 features of colored-light inputs and their ambient ground-truth counterparts, thereby providing empirical grounding for the D.O.G. design. revision: yes
Referee: [Ablation studies] Ablation studies: no experiment removes D.O.G. while retaining B FACG + SFFB (or vice versa); the end-to-end PSNR gain therefore does not isolate whether the DINO priors contribute meaningfully or whether the color-frequency refinement blocks alone suffice.

Authors: We agree that the current ablations do not fully isolate the contribution of each component. We will add a new ablation table in the revised manuscript that includes a variant with D.O.G. removed while retaining B FACG and SFFB, reporting the resulting PSNR to quantify the incremental benefit of the DINO priors. revision: yes

Circularity Check

0 steps flagged

No circularity: external DINOv3 prior and independent architectural modules

full rationale

The derivation begins with an empirical observation of DINOv3 feature consistency (external model, not fitted on target lighting data) and proceeds to architectural choices (D.O.G. injection, B FACG + SFFB refinement). No equations reduce a prediction to a fitted input by construction, no self-citations load-bear the central claim, and no ansatz or uniqueness theorem is smuggled in. The reported PSNR gains and challenge rankings are end-to-end empirical results on CL3AN and NTIRE tracks, not self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about DINOv3 feature invariance; no free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption DINOv3 self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth
Stated as the motivating observation that justifies using DINOv3 as semantic priors.

pith-pipeline@v0.9.0 · 5528 in / 1275 out tokens · 41467 ms · 2026-05-13T19:52:36.374200+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe that DINOv3's self-supervised features remain highly consistent between colored-light inputs and ambient-lit ground truth, motivating their use as illumination-robust semantic priors.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DINO Omni-layer Guidance (D.O.G.) ... color-frequency refinement design (BFACG + SFFB)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Cottrell, and Julian McAuley

Thomas Bachlechner, Bodhisattva Paul Majumdar, Huanru Henry Mao, Garrison W. Cottrell, and Julian McAuley. ReZero is all you need: Fast convergence at large depth. In UAI, 2021. 4

work page 2021
[2]

Retinexformer: One-stage retinex- based transformer for low-light image enhancement

Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Tim- ofte, and Yulun Zhang. Retinexformer: One-stage retinex- based transformer for low-light image enhancement. In ICCV, 2023. 3 and 6

work page 2023
[3]

Simple baselines for image restoration

Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. arXiv preprint arXiv:2204.04676, 2022. 1, 3, 5, 6, 7, and 8

work page arXiv 2022
[4]

Frequency-aware feature fusion for dense image prediction

Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, and Gao Huang. Frequency-aware feature fusion for dense image prediction. TPAMI, 2024. 3 and 5

work page 2024
[5]

WWE-UIE: A wavelet & white balance efficient network for underwater image enhancement

Ching-Heng Cheng, Jen-Wei Lee, Chia-Ming Lee, and Chih- Chung Hsu. WWE-UIE: A wavelet & white balance efficient network for underwater image enhancement. In W ACV ,

work page
[6]

Detecting moving objects, ghosts, and shad- ows in video streams

Rita Cucchiara, Costantino Grana, Massimo Piccardi, and Andrea Prati. Detecting moving objects, ghosts, and shad- ows in video streams. TPAMI, 2003. 3

work page 2003
[7]

Selective frequency network for image restoration

Yuning Cui, Yi Tao, Zhenshan Bing, Wenqi Ren, Xinwei Gao, Xiaochun Cao, Kai Huang, and Alois Knoll. Selective frequency network for image restoration. In ICLR , 2023. 3 and 6

work page 2023
[8]

Mambair: A simple baseline for im- age restoration with state-space model

Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for im- age restoration with state-space model. In ECCV . Springer,

work page
[9]

Shadowformer: global context helps shadow removal

Lanqing Guo, Siyu Huang, Ding Liu, Hao Cheng, and Bihan Wen. Shadowformer: global context helps shadow removal. In AAAI, 2023. 3

work page 2023
[10]

Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal

Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, and Bihan Wen. Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal. In CVPR, 2023. 3

work page 2023
[11]

Zur theorie der orthogonalen funktionensys- teme

Alfred Haar. Zur theorie der orthogonalen funktionensys- teme. Mathematische Annalen, 1910. 5

work page 1910
[12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR ,

work page
[13]

GANs trained by a two time-scale update rule converge to a local Nash equi- librium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equi- librium. In NeurIPS, 2017. 6

work page 2017
[14]

Des3: Adaptive attention-driven self and soft shadow removal using vit similarity

Yeying Jin, Wei Ye, Wenhan Yang, Yuan Yuan, and Robby T Tan. Des3: Adaptive attention-driven self and soft shadow removal using vit similarity. In AAAI, 2024. 3

work page 2024
[15]

Hinet: Deep image hiding by invertible network

Junpeng Jing, Xin Deng, Mai Xu, Jianyi Wang, and Zhenyu Guan. Hinet: Deep image hiding by invertible network. In ICCV, 2021. 3 and 6

work page 2021
[16]

The retinex theory of color vision

Edwin H Land. The retinex theory of color vision. Scientific American, 1977. 3

work page 1977
[17]

Shadow removal via shadow image decomposition

Hieu Le and Dimitris Samaras. Shadow removal via shadow image decomposition. In ICCV, 2019. 3

work page 2019
[18]

Phasr: Generalized im- age shadow removal with physically aligned priors, 2026

Chia-Ming Lee, Yu-Fan Lin, Yu-Jou Hsiao, Jing-Hui Jung, Yu-Lun Liu, and Chih-Chung Hsu. Phasr: Generalized im- age shadow removal with physically aligned priors, 2026. 3

work page 2026
[19]

Reflexsplit: Sin- gle image reflection separation via layer fusion-separation

Chia-Ming Lee, Yu-Fan Lin, Jin-Hui Jiang, Yu-Jou Hsiao, Chih-Chung Hsu, and Yu-Lun Liu. Reflexsplit: Sin- gle image reflection separation via layer fusion-separation. arXiv preprint arXiv:2601.17468, 2026. 3 and 5

work page arXiv 2026
[20]

Densesr: Image shadow removal as dense prediction

Yu-Fan Lin, Chia-Ming Lee, and Chih-Chung Hsu. Densesr: Image shadow removal as dense prediction. In ACM MM ,

work page
[21]

Regional atten- tion for shadow removal

Hengxing Liu, Mingjia Li, and Xiaojie Guo. Regional atten- tion for shadow removal. In ACM MM, 2024. 3

work page 2024
[22]

Multi-level wavelet-CNN for image restoration

Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo. Multi-level wavelet-CNN for image restoration. In CVPRW, 2018. 3 and 5

work page 2018
[23]

Latent feature-guided diffusion models for shadow removal

Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, and Vishal M Patel. Latent feature-guided diffusion models for shadow removal. In W ACV, 2024. 3

work page 2024
[24]

DINOv3

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, et al. Dinov3: Learning robust visual features without supervision. arXiv preprint arXiv:2508.10104, 2025. 1, 2, 3, 4, and 8

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Promptir: Prompting for all-in-one image restoration

Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. Promptir: Prompting for all-in-one image restoration. In NeurIPS, 2023. 3 and 4

work page 2023
[26]

Deshadownet: A multi-context embedding deep network for shadow removal

Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, and Rynson WH Lau. Deshadownet: A multi-context embedding deep network for shadow removal. In CVPR ,

work page
[27]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In ICML. PMLR, 2021. 1, 2, and 8

work page 2021
[28]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 2015. 3

work page 2015
[29]

U- Net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. In MICCAI, 2015. 5

work page 2015
[30]

Removing shadows from images using color and near-infrared

Neda Salamati, Arthur Germain, and S Siisstrunk. Removing shadows from images using color and near-infrared. In ICIP. IEEE, 2011. 3

work page 2011
[31]

Molina-Bakhos, Danna Xue, Yixiong Yang, Maria Pilligua, Ramon Baldrich, Maria Vanrell, and Javier Vazquez-Corral

David Serrano-Lozano, Francisco A. Molina-Bakhos, Danna Xue, Yixiong Yang, Maria Pilligua, Ramon Baldrich, Maria Vanrell, and Javier Vazquez-Corral. Promptnorm: Image geometry guides ambient light normalization. In CVPRW ,

work page
[32]

1, 2, 3, 4, 5, 6, and 7

work page
[33]

The shadow meets the mask: Pyramid-based shadow removal

Yael Shor and Dani Lischinski. The shadow meets the mask: Pyramid-based shadow removal. In Computer Graphics Forum. Wiley Online Library, 2008. 3

work page 2008
[34]

Towards image ambi- ent lighting normalization

Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, Rakesh Ranjan, and Radu Timofte. Towards image ambi- ent lighting normalization. In ECCV . Springer, 2024. 1, 2, 3, 5, 6, and 7

work page 2024
[35]

After the party: Navigating the mapping from color to ambient lighting

Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu, and Radu Timofte. After the party: Navigating the mapping from color to ambient lighting. In ICCV , 2025. 1, 2, 3, 5, 6, and 7

work page 2025
[36]

The second ambient lighting normalization challenge at NTIRE 2026

Florin-Alexandru Vasluianu, Tim Seizinger, Jeffrey Chen, Zhuyun Zhou, Zongwei Wu, Radu Timofte, et al. The second ambient lighting normalization challenge at NTIRE 2026. In CVPRW, 2026. 7 and 8

work page 2026
[37]

Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal

Jifeng Wang, Xiang Li, and Jian Yang. Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In CVPR, 2018. 1

work page 2018
[38]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. TIP, 2004. 5 and 6

work page 2004
[39]

Uformer: A gen- eral u-shaped transformer for image restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A gen- eral u-shaped transformer for image restoration. In CVPR ,

work page
[40]

Homoformer: Homogenized trans- former for image shadow removal

Jie Xiao, Xueyang Fu, Yurui Zhu, Dong Li, Jie Huang, Kai Zhu, and Zheng-Jun Zha. Homoformer: Homogenized trans- former for image shadow removal. In CVPR, 2024. 3

work page 2024
[41]

Omnisr: Shadow removal under direct and indirect lighting

Jiamin Xu, Zelong Li, Yuxin Zheng, Chenyu Huang, Renshu Gu, Weiwei Xu, and Gang Xu. Omnisr: Shadow removal under direct and indirect lighting. In AAAI, 2025. 3

work page 2025
[42]

Detail-preserving latent diffusion for stable shadow removal

Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, and Gang Xu. Detail-preserving latent diffusion for stable shadow removal. In CVPR, 2025. 3

work page 2025
[43]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022. 1, 3, 5, 6, and 7

work page 2022
[44]

Shadow re- mover: Image shadow removal based on illumination recov- ering optimization

Ling Zhang, Qing Zhang, and Chunxia Xiao. Shadow re- mover: Image shadow removal based on illumination recov- ering optimization. TIP, 2015. 3

work page 2015
[45]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018. 6

work page 2018