pith. sign in

arxiv: 2511.14183 · v3 · submitted 2025-11-18 · 💻 cs.CV

UniSER: A Foundation Model for Unified Soft Effects Removal

Pith reviewed 2026-05-17 21:16 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationsoft effectslens flarehazeshadow removalreflection removaldiffusion transformerunified model
0
0 comments X

The pith

A single diffusion model can remove lens flare, haze, shadows, and reflections by treating them all as semi-transparent occlusions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that soft effects in images share a common structure as partial, semi-transparent layers over the scene. Rather than developing separate specialist models for each degradation type, the authors build one versatile model that learns shared restoration priors from a combined dataset. They create 3.8 million training pairs, including new physically plausible synthetic examples, then fine-tune a Diffusion Transformer using mask and strength controls. If the approach holds, it would let a single network deliver high-fidelity results across mixed or unknown degradations while keeping the underlying scene intact.

Core claim

By modeling diverse soft effects as semi-transparent occlusions and training a Diffusion Transformer on a large unified dataset of 3.8 million pairs with added physically plausible examples, the UniSER model learns robust priors that enable high-fidelity removal of multiple degradations within one framework, outperforming both specialized and prompt-based generalist models.

What carries the argument

UniSER, a Diffusion Transformer fine-tuned on mixed soft-effect data with integrated fine-grained mask and strength controls to capture shared semi-transparent occlusion priors.

If this is right

  • Separate specialist models for each soft effect become unnecessary.
  • Restoration succeeds on wild images that contain several soft effects at once.
  • Scene identity is preserved more reliably than with prompt-driven general editing tools.
  • New physically plausible synthetic data fills gaps that limit current benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same occlusion-based modeling could extend to time sequences for consistent video restoration.
  • Mask and strength controls might support interactive editing where users adjust only part of an image.
  • Similar unified training could address other partial-layer degradations such as light rain or thin fog.

Load-bearing premise

That the shared semi-transparent occlusion property is sufficient for one model to learn accurate restoration across all soft-effect types without losing performance on any single type.

What would settle it

Real-world test images containing only one soft effect type, such as isolated lens flare, where the unified model produces more visible artifacts or lower fidelity than a specialist model trained exclusively on that effect.

Figures

Figures reproduced from arXiv: 2511.14183 by Connelly Barnes, Eli Shechtman, Haoran You, Jingdong Zhang, Lingzhi Zhang, Mang Tik Chiu, Qing Liu, Sohrab Amirghodsi, Wenping Wang, Xiaohang Zhan, Xiaoyang Liu, Xin Li, Yizhou Wang, Yuqian Zhou, Zhe Lin.

Figure 1
Figure 1. Figure 1: Our UniSER eliminates multiple challenging (a) and even undefined (b) soft effects from in-the-wild images while preserving [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of our curated data samples and synthetic haze by our method. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of UniSER. During training, the mask is randomly synthesized along with a scalar strength, and the supervision [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparisons with state-of-the-art specialist and generalist models on in-the-wild testing data. For effect removal, our method [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Illustration of Strength Control for effect removal. (b) Illustration of Mask Control for accurate user regional editing. (c) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of our synthetic haze generated by our the proposed pipeline. Our method is capable of synthesizing multiple [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Contrast maps of image before and after edit by UniSER. Significant enhancements of contrast inside effect regions are observed, [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Gallery: Removing effects with UniSER. 5 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Gallery: Removing effects with UniSER. 6 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Gallery: Adding or enhancing effects with UniSER. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
read the original abstract

Digital images are often degraded by soft effects such as lens flare, haze, shadows, and reflections, which reduce aesthetics even though the underlying pixels remain partially visible. The prevailing works address these degradations in isolation, developing highly specialized, specialist models that lack scalability and fail to exploit the shared underlying essences of these restoration problems. Meanwhile, although recent large-scale generalist models (e.g., GPT-4o, Flux Kontext, Nano Banana) offer powerful text-driven editing capabilities, they heavily rely on detailed prompts and often fail to achieve robust removal on such fine-grained tasks while preserving the scene's identity. Leveraging the common essence of soft effects, i.e., semi-transparent occlusions, we introduce a foundational versatile model UniSER, capable of addressing diverse degradations caused by soft effects within a single framework. Our methodology centers on curating a massive 3.8M-pair dataset to ensure robustness and generalization, which includes novel, physically-plausible data to fill critical gaps in public benchmarks, and a tailored training pipeline that fine-tunes a Diffusion Transformer to learn robust restoration priors from this diverse data, integrating fine-grained mask and strength controls. This synergistic approach allows UniSER to significantly outperform both specialist and generalist models, achieving robust, high-fidelity restoration in the wild.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents UniSER, a unified foundation model for removing soft effects including lens flare, haze, shadows, and reflections from images. It treats these degradations as sharing the common essence of semi-transparent occlusions, curates a 3.8M-pair dataset that incorporates novel physically-plausible synthetic pairs to address gaps in public benchmarks, and fine-tunes a Diffusion Transformer with added mask and strength control inputs to learn shared restoration priors, claiming significant outperformance over both specialist models and generalist text-driven editors such as GPT-4o while preserving scene identity.

Significance. If the empirical claims hold, the work would be significant for image restoration by demonstrating that a single scalable model can replace multiple specialist networks for related degradations, while also improving on prompt-reliant generalists for fine-grained tasks. The scale of the curated dataset and the introduction of physically-plausible synthetic data constitute concrete strengths that could help close data gaps; machine-checked elements are absent but the approach is falsifiable via the reported comparisons.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Methodology): The central claim that the shared semi-transparent occlusion inductive bias suffices for robust priors across distinct physics is load-bearing, yet the architecture description adds only mask and strength controls to the Diffusion Transformer without explicit per-effect physical modules. Given material differences in image formation (volumetric scattering for haze, optical diffraction for lens flare, geometric penumbra for shadows, viewpoint-dependent reflection), the risk of averaged restoration strategies must be directly tested via per-degradation ablations or visualizations showing no accuracy loss relative to specialists.
  2. [§4] §4 (Dataset Curation): The assertion that the 3.8M-pair dataset including novel physically-plausible synthetics fills critical gaps requires quantitative validation. Details on synthetic generation parameters, distribution statistics per effect type, and explicit checks against real-world shift (e.g., via FID or perceptual metrics on held-out real images) are needed to confirm the data supports the unified prior rather than introducing new biases.
  3. [§5] §5 (Experiments): The abstract states significant outperformance over specialist and generalist models, but the evaluation must include full quantitative tables with metrics (PSNR, SSIM, LPIPS, user studies), exact baseline implementations, and cross-effect ablations. Without these, the superiority claim on every individual degradation type remains unsupported.
minor comments (2)
  1. [§3] Clarify the precise conditioning mechanism for mask and strength inputs within the Diffusion Transformer (e.g., which layers receive the controls and how they are encoded) to aid reproducibility.
  2. [References] Add missing references for the cited generalist models (Flux Kontext, Nano Banana) and ensure comparison protocols are described consistently with prior specialist papers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to strengthen the presentation of our claims and provide point-by-point responses below.

read point-by-point responses
  1. Referee: [Abstract and §3] The central claim that the shared semi-transparent occlusion inductive bias suffices for robust priors across distinct physics is load-bearing, yet the architecture description adds only mask and strength controls to the Diffusion Transformer without explicit per-effect physical modules. Given material differences in image formation (volumetric scattering for haze, optical diffraction for lens flare, geometric penumbra for shadows, viewpoint-dependent reflection), the risk of averaged restoration strategies must be directly tested via per-degradation ablations or visualizations showing no accuracy loss relative to specialists.

    Authors: We agree that explicit verification of the shared inductive bias is important to rule out averaged strategies. The original manuscript demonstrated overall gains via specialist comparisons, but to directly address this concern we have added per-degradation quantitative breakdowns and qualitative visualizations in the revised Section 5 and supplementary material. These results show that UniSER matches or exceeds specialist performance on each individual effect with no observable averaging artifacts, supporting the sufficiency of the semi-transparent occlusion prior across differing image-formation processes. revision: yes

  2. Referee: [§4] The assertion that the 3.8M-pair dataset including novel physically-plausible synthetics fills critical gaps requires quantitative validation. Details on synthetic generation parameters, distribution statistics per effect type, and explicit checks against real-world shift (e.g., via FID or perceptual metrics on held-out real images) are needed to confirm the data supports the unified prior rather than introducing new biases.

    Authors: We concur that additional quantitative characterization of the dataset would improve transparency. In the revised §4 we have included the synthetic generation parameters, per-effect distribution statistics, and new experiments reporting FID and perceptual metrics on held-out real images. These checks confirm limited domain shift and indicate that the curated data supports learning of the unified restoration prior. revision: yes

  3. Referee: [§5] The abstract states significant outperformance over specialist and generalist models, but the evaluation must include full quantitative tables with metrics (PSNR, SSIM, LPIPS, user studies), exact baseline implementations, and cross-effect ablations. Without these, the superiority claim on every individual degradation type remains unsupported.

    Authors: We acknowledge that fuller reporting of the experimental results is warranted. The revised Section 5 now contains complete quantitative tables with PSNR, SSIM, LPIPS, and user-study results, together with explicit descriptions of baseline implementations and cross-effect ablations. These additions substantiate the per-degradation performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model training on curated data with no derivation chain

full rationale

The paper describes curating a 3.8M-pair dataset (including novel physically-plausible synthetics) and fine-tuning a Diffusion Transformer with mask/strength controls to learn restoration priors for soft effects treated as semi-transparent occlusions. All claims rest on empirical training, generalization, and comparisons to specialist/generalist models. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided abstract or methodology summary. The central result is a trained model evaluated on held-out data, which is self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach depends on the assumption that soft effects share a learnable semi-transparent occlusion structure, on the quality and coverage of the 3.8M-pair dataset (including novel synthetic data), and on standard diffusion model training assumptions. No independent verification of these elements is possible from the abstract alone.

free parameters (2)
  • mask and strength control inputs
    Fine-grained controls integrated during fine-tuning; their exact parameterization and scaling are chosen to enable user control but are not derived from first principles.
  • training hyperparameters for Diffusion Transformer
    Learning rate, batch size, number of steps, and other optimization choices required to fine-tune the base model on the curated data.
axioms (2)
  • domain assumption Diffusion Transformer architecture can learn robust restoration priors from paired clean-degraded data
    Invoked when stating that fine-tuning on the 3.8M dataset produces a versatile model.
  • ad hoc to paper The curated dataset, including novel physically-plausible pairs, sufficiently covers real-world soft-effect distributions
    Central to the claim of robustness and generalization beyond public benchmarks.

pith-pipeline@v0.9.0 · 5577 in / 1703 out tokens · 46327 ms · 2026-05-17T21:16:12.180583+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

    cs.CV 2026-05 unverdicted novelty 6.0

    Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency i...

  2. DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

    cs.CV 2026-04 unverdicted novelty 6.0

    DiffHDR converts LDR videos to HDR by formulating the task as generative radiance inpainting in a video diffusion model's latent space, using Log-Gamma encoding and synthesized training data to achieve better fidelity...

Reference graph

Works this paper leans on

98 extracted references · 98 canonical work pages · cited by 2 Pith papers · 7 internal anchors

  1. [1]

    I-haze: A dehazing bench- mark with real hazy and haze-free indoor images

    Cosmin Ancuti, Codruta O Ancuti, Radu Timofte, and Christophe De Vleeschouwer. I-haze: A dehazing bench- mark with real hazy and haze-free indoor images. InInterna- tional conference on advanced concepts for intelligent vision systems, pages 620–631. Springer, 2018. 4, 1

  2. [2]

    O-haze: a dehazing bench- mark with real hazy and haze-free outdoor images

    Codruta O Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer. O-haze: a dehazing bench- mark with real hazy and haze-free outdoor images. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition workshops, pages 754–762, 2018. 1

  3. [3]

    Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images

    Codruta O Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images. In2019 IEEE interna- tional conference on image processing (ICIP), pages 1014–

  4. [4]

    Nh-haze: An image dehazing benchmark with non- homogeneous hazy and haze-free images

    Codruta O Ancuti, Cosmin Ancuti, and Radu Timo- fte. Nh-haze: An image dehazing benchmark with non- homogeneous hazy and haze-free images. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 444–445, 2020. 1

  5. [5]

    Ntire 2021 nonhomogeneous dehazing challenge report

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, and Radu Timofte. Ntire 2021 nonhomogeneous dehazing challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 627–646, 2021

  6. [6]

    Ntire 2023 hr non- homogeneous dehazing challenge report

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Han Zhou, Wei Dong, Yangyi Liu, Jun Chen, Huan Liu, Liangyan Li, et al. Ntire 2023 hr non- homogeneous dehazing challenge report. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1808–1825, 2023

  7. [7]

    Ntire 2024 dense and non-homogeneous dehazing challenge report

    Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Yidi Liu, Xingbo Wang, Yurui Zhu, Gege Shi, Xin Lu, Xueyang Fu, et al. Ntire 2024 dense and non-homogeneous dehazing challenge report. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6453–6468, 2024. 4, 1

  8. [8]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 3, 6

  9. [9]

    Blender—a 3d modelling and rendering package.Blender Foundation, 2018

    D Blender Online Community. Blender—a 3d modelling and rendering package.Blender Foundation, 2018. 3

  10. [10]

    In- structpix2pix: Learning to follow image editing instructions

    Tim Brooks, Aleksander Holynski, and Alexei A Efros. In- structpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 18392–18402, 2023. 3

  11. [11]

    Seedream-4: A large-scale text-to-image gener- ation model, 2024

    ByteDance. Seedream-4: A large-scale text-to-image gener- ation model, 2024. 6

  12. [12]

    Gated context aggregation network for image dehazing and derain- ing

    Dongdong Chen, Mingming He, Qingnan Fan, Jing Liao, Li- heng Zhang, Dongdong Hou, Lu Yuan, and Gang Hua. Gated context aggregation network for image dehazing and derain- ing. In2019 IEEE winter conference on applications of com- puter vision (WACV), pages 1375–1383. IEEE, 2019. 2, 3, 6

  13. [13]

    Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior

    I Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang, et al. Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 17969–17979, 2025. 2, 3

  14. [14]

    Unireal: Universal image generation and editing via learning real-world dynamics

    Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, et al. Unireal: Universal image generation and editing via learning real-world dynamics. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12501–12511, 2025. 4

  15. [15]

    Psd: Principled synthetic-to-real dehazing guided by physi- cal priors

    Zeyuan Chen, Yangchao Wang, Yang Yang, and Dong Liu. Psd: Principled synthetic-to-real dehazing guided by physi- cal priors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7180–7189,

  16. [16]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 2, 3

  17. [17]

    Flare7k: A phenomenological night- time flare removal dataset.Advances in Neural Information Processing Systems, 35:3926–3937, 2022

    Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Flare7k: A phenomenological night- time flare removal dataset.Advances in Neural Information Processing Systems, 35:3926–3937, 2022. 3, 6, 7

  18. [18]

    Nighttime smartphone reflective flare re- moval using optical center symmetry prior

    Yuekun Dai, Yihang Luo, Shangchen Zhou, Chongyi Li, and Chen Change Loy. Nighttime smartphone reflective flare re- moval using optical center symmetry prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20783–20791, 2023. 6

  19. [19]

    Mipi 2024 challenge on nighttime flare removal: Methods and re- sults.arXiv preprint arXiv:2404.19534, 2024

    Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, et al. Mipi 2024 challenge on nighttime flare removal: Methods and re- sults.arXiv preprint arXiv:2404.19534, 2024. 3, 4, 1

  20. [20]

    Shadowrefiner: Towards mask-free shadow removal via fast fourier transformer

    Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, and Jun Chen. Shadowrefiner: Towards mask-free shadow removal via fast fourier transformer. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 6208–6217, 2024. 2, 3, 6

  21. [21]

    Location-aware single image reflection re- 9 moval

    Zheng Dong, Ke Xu, Yin Yang, Hujun Bao, Weiwei Xu, and Rynson WH Lau. Location-aware single image reflection re- 9 moval. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 5017–5026, 2021. 3

  22. [22]

    Cycle- dehaze: Enhanced cyclegan for single image dehazing

    Deniz Engin, Anil Genc ¸, and Hazim Kemal Ekenel. Cycle- dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 825–833, 2018. 2, 3

  23. [23]

    A generic deep architecture for single image re- flection removal and image smoothing

    Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. A generic deep architecture for single image re- flection removal and image smoothing. InProceedings of the IEEE International Conference on Computer Vision, pages 3238–3247, 2017. 3

  24. [24]

    Auto- exposure fusion for single-image shadow removal

    Lan Fu, Changqing Zhou, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Wei Feng, Yang Liu, and Song Wang. Auto- exposure fusion for single-image shadow removal. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10571–10580, 2021. 3

  25. [25]

    Deflare-net: Flare detection and removal network

    Allabakash Ghodesawar, Vinod Patil, Ankit Raichur, Swaroop Adrashyappanamath, Sampada Malagi, Nikhil Akalwadi, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, and Uma Mudenagudi. Deflare-net: Flare detection and removal network. InInternational Conference on Pat- tern Recognition and Machine Intelligence, pages 465–472. Springer, 2023. 2, 3

  26. [26]

    Introducing gemini 2.5 flash image, our state-of- the-art image model, 2025

    Google. Introducing gemini 2.5 flash image, our state-of- the-art image model, 2025. 2, 6

  27. [27]

    Shadowformer: Global context helps shadow removal

    Lanqing Guo, Siyu Huang, Ding Liu, Hao Cheng, and Bihan Wen. Shadowformer: Global context helps shadow removal. InProceedings of the AAAI conference on artificial intelli- gence, pages 710–718, 2023. 6

  28. [28]

    Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal

    Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, and Bihan Wen. Shadowd- iffusion: When degradation prior meets diffusion model for shadow removal. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 14049–14058, 2023. 6

  29. [29]

    Single-image shadow detection and removal using paired regions

    Ruiqi Guo, Qieyun Dai, and Derek Hoiem. Single-image shadow detection and removal using paired regions. InCVPR 2011, pages 2033–2040. IEEE, 2011. 3

  30. [30]

    Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

  31. [31]

    Disentan- gle nighttime lens flares: self-supervised generation-based lens flare removal

    Yuwen He, Wei Wang, Wanyu Wu, and Kui Jiang. Disentan- gle nighttime lens flares: self-supervised generation-based lens flare removal. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3464–3472, 2025. 2

  32. [32]

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aber- man, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control.(2022).URL https://arxiv. org/abs/2208.01626, 3, 2022. 3

  33. [33]

    Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 8

  34. [34]

    L-differ: Single image reflection re- moval with language-based diffusion model

    Yuchen Hong, Haofeng Zhong, Shuchen Weng, Jinxiu Liang, and Boxin Shi. L-differ: Single image reflection re- moval with language-based diffusion model. InEuropean Conference on Computer Vision, pages 58–76. Springer,

  35. [35]

    Trash or treasure? an interac- tive dual-stream strategy for single image reflection separa- tion.Advances in Neural Information Processing Systems, 34:24683–24694, 2021

    Qiming Hu and Xiaojie Guo. Trash or treasure? an interac- tive dual-stream strategy for single image reflection separa- tion.Advances in Neural Information Processing Systems, 34:24683–24694, 2021. 7

  36. [36]

    Single image reflection sep- aration via component synergy

    Qiming Hu and Xiaojie Guo. Single image reflection sep- aration via component synergy. InProceedings of the IEEE/CVF international conference on computer vision, pages 13138–13147, 2023. 7

  37. [37]

    Single image reflection removal via inter-layer complementarity

    Yue Huang, Zi’ang Li, Tianle Hu, Jie Wen, Guanbin Li, Jinglin Zhang, Guoxu Zhou, and Xiaozhao Fang. Single image reflection removal via inter-layer complementarity. arXiv preprint arXiv:2505.12641, 2025. 3

  38. [38]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. 2, 3, 6

  39. [39]

    Hazespace2m: A dataset for haze aware single image dehazing

    Md Tanvir Islam, Nasir Rahim, Saeed Anwar, Muhammad Saqib, Sambit Bakshi, and Khan Muhammad. Hazespace2m: A dataset for haze aware single image dehazing. InProceed- ings of the 32nd ACM International Conference on Multime- dia, pages 9155–9164, 2024. 3, 4, 1

  40. [40]

    Autodir: Automatic all-in-one image restoration with latent diffusion

    Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, and Jinwei Gu. Autodir: Automatic all-in-one image restoration with latent diffusion. InEuropean Conference on Computer Vi- sion, pages 340–359. Springer, 2024. 3

  41. [41]

    Dc- shadownet: Single-image hard and soft shadow removal us- ing unsupervised domain-classifier guided network

    Yeying Jin, Aashish Sharma, and Robby T Tan. Dc- shadownet: Single-image hard and soft shadow removal us- ing unsupervised domain-classifier guided network. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 5027–5036, 2021. 6

  42. [42]

    Imagic: Text-based real image editing with diffusion models

    Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6007–6017, 2023. 3

  43. [43]

    Marigold: Affordable adaptation of diffusion- based image generators for image analysis.arXiv preprint arXiv:2505.09358, 2025

    Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable adaptation of diffusion- based image generators for image analysis.arXiv preprint arXiv:2505.09358, 2025. 4, 2

  44. [44]

    Single image reflection removal with physically-based training images

    Soomin Kim, Yuchi Huo, and Sung-Eui Yoon. Single image reflection removal with physically-based training images. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5164–5173, 2020. 3

  45. [45]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv preprint arXiv:1312.6114, 2013. 4

  46. [46]

    Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space.arXiv preprint arXiv:2506.15742,

  47. [47]

    Shadow removal via shadow image decomposition

    Hieu Le and Dimitris Samaras. Shadow removal via shadow image decomposition. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 8578– 8587, 2019. 2, 3

  48. [48]

    Robust reflection re- moval with reflection-free flash-only cues

    Chenyang Lei and Qifeng Chen. Robust reflection re- moval with reflection-free flash-only cues. InProceedings 10 of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 14811–14820, 2021. 4, 1

  49. [49]

    Polarized reflection re- moval with perfect alignment in the wild

    Chenyang Lei, Xuhua Huang, Mengdi Zhang, Qiong Yan, Wenxiu Sun, and Qifeng Chen. Polarized reflection re- moval with perfect alignment in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1750–1758, 2020. 4, 1

  50. [50]

    Aod-net: All-in-one dehazing network

    Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. Aod-net: All-in-one dehazing network. InPro- ceedings of the IEEE international conference on computer vision, pages 4770–4778, 2017. 2, 3, 6

  51. [51]

    Benchmarking single- image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single- image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018. 3, 4, 6, 1

  52. [52]

    Single image reflection removal through cascaded refinement

    Chao Li, Yixiao Yang, Kun He, Stephen Lin, and John E Hopcroft. Single image reflection removal through cascaded refinement. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3565–3574,

  53. [53]

    All in one bad weather removal using architectural search

    Ruoteng Li, Robby T Tan, and Loong-Fah Cheong. All in one bad weather removal using architectural search. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3175–3185, 2020. 2, 3

  54. [54]

    Densesr: Image shadow removal as dense prediction.arXiv preprint arXiv:2507.16472, 2025

    Yu-Fan Lin, Chia-Ming Lee, and Chih-Chung Hsu. Densesr: Image shadow removal as dense prediction.arXiv preprint arXiv:2507.16472, 2025. 3

  55. [55]

    Diff-plugin: Revitalizing details for diffusion-based low-level tasks

    Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, and Rynson WH Lau. Diff-plugin: Revitalizing details for diffusion-based low-level tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4197–4208, 2024. 3

  56. [56]

    From shadow generation to shadow removal

    Zhihao Liu, Hui Yin, Xinyi Wu, Zhenyao Wu, Yang Mi, and Song Wang. From shadow generation to shadow removal. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 4927–4936, 2021. 3

  57. [57]

    Pytorch: An im- perative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im- perative style, high-performance deep learning library.Ad- vances in neural information processing systems, 32, 2019. 4

  58. [58]

    Promptir: Prompting for all-in- one image restoration.Advances in Neural Information Pro- cessing Systems, 36:71275–71293, 2023

    Vaishnav Potlapalli, Syed Waqas Zamir, Salman H Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in- one image restoration.Advances in Neural Information Pro- cessing Systems, 36:71275–71293, 2023. 2, 3

  59. [59]

    Deshadownet: A multi-context embedding deep network for shadow removal

    Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, and Rynson WH Lau. Deshadownet: A multi-context embedding deep network for shadow removal. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 4067–4075, 2017. 3, 4, 6, 1

  60. [60]

    Unsupervised single-image reflection removal

    Hamed RahmaniKhezri, Suhong Kim, and Mohamed Hefeeda. Unsupervised single-image reflection removal. IEEE Transactions on Multimedia, 25:4958–4971, 2022. 3

  61. [61]

    Awracle: All- weather image restoration using visual in-context learning

    Sudarshan Rajagopalan and Vishal M Patel. Awracle: All- weather image restoration using visual in-context learning. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 6675–6683, 2025. 3

  62. [62]

    Dragdiffusion: Harnessing diffusion models for interactive point-based image editing

    Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Han- shu Yan, Wenqing Zhang, Vincent YF Tan, and Song Bai. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8839–8849, 2024. 3

  63. [63]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 3

  64. [64]

    Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 2, 3, 6

  65. [65]

    Resolution-robust large mask inpainting with fourier convolutions

    Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust large mask inpainting with fourier convolutions. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149– 2159, 2022. 5

  66. [66]

    Degradation-aware feature perturbation for all- in-one image restoration

    Xiangpeng Tian, Xiangyu Liao, Xiao Liu, Meng Li, and Chao Ren. Degradation-aware feature perturbation for all- in-one image restoration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28165– 28175, 2025. 3

  67. [67]

    Sfnet-a spatial-frequency domain neural network for image lens flare removal

    Florin Vasluianu, Zongwei Wu, and Radu Timofte. Sfnet-a spatial-frequency domain neural network for image lens flare removal. In2024 IEEE International Conference on Image Processing (ICIP), pages 1711–1717. IEEE, 2024. 3

  68. [68]

    Wsrd: A novel benchmark for high resolution image shadow removal

    Florin-Alexandru Vasluianu, Tim Seizinger, and Radu Tim- ofte. Wsrd: A novel benchmark for high resolution image shadow removal. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1826–1835, 2023. 3, 4, 6, 1

  69. [69]

    Benchmarking single-image reflection removal algorithms

    Renjie Wan, Boxin Shi, Ling-Yu Duan, Ah-Hwee Tan, and Alex C Kot. Benchmarking single-image reflection removal algorithms. InProceedings of the IEEE international con- ference on computer vision, pages 3922–3930, 2017. 2, 3, 6

  70. [70]

    Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal

    Jifeng Wang, Xiang Li, and Jian Yang. Stacked condi- tional generative adversarial networks for jointly learning shadow detection and shadow removal. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1788–1797, 2018. 3, 4, 6, 1

  71. [71]

    Learn- ing hazing to dehazing: Towards realistic haze generation for real-world image dehazing

    Ruiyi Wang, Yushuo Zheng, Zicheng Zhang, Chunyi Li, Shuaicheng Liu, Guangtao Zhai, and Xiaohong Liu. Learn- ing hazing to dehazing: Towards realistic haze generation for real-world image dehazing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23091– 23100, 2025. 3, 6

  72. [72]

    Promptrr: Diffusion models as prompt generators for single image reflection removal.arXiv preprint arXiv:2402.02374,

    Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae- Kyun Kim, Tong Lu, Hongdong Li, and Ming-Hsuan Yang. Promptrr: Diffusion models as prompt generators for single image reflection removal.arXiv preprint arXiv:2402.02374,

  73. [73]

    Ucl-dehaze: Toward real-world image dehazing via un- 11 supervised contrastive learning.IEEE Transactions on Im- age Processing, 33:1361–1374, 2024

    Yongzhen Wang, Xuefeng Yan, Fu Lee Wang, Haoran Xie, Wenhan Yang, Xiao-Ping Zhang, Jing Qin, and Mingqiang Wei. Ucl-dehaze: Toward real-world image dehazing via un- 11 supervised contrastive learning.IEEE Transactions on Im- age Processing, 33:1361–1374, 2024. 6, 3

  74. [74]

    Single image reflection removal be- yond linearity

    Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, Guoqiang Han, and Shengfeng He. Single image reflection removal be- yond linearity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3771– 3779, 2019. 3

  75. [75]

    How to train neural networks for flare removal

    Yicheng Wu, Qiurui He, Tianfan Xue, Rahul Garg, Jiawen Chen, Ashok Veeraraghavan, and Jonathan T Barron. How to train neural networks for flare removal. InProceedings of the IEEE/CVF international conference on computer vision, pages 2239–2247, 2021. 2, 3

  76. [76]

    Detail-preserving latent diffusion for stable shadow removal

    Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, and Gang Xu. Detail-preserving latent diffusion for stable shadow removal. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7592– 7602, 2025. 6

  77. [77]

    arXiv preprint arXiv:2507.17489 , year=

    Minglong Xue, Aoxiang Ning, Shivakumara Palaiahnakote, and Mingliang Zhou. Dfdnet: Dynamic frequency-guided de-flare network.arXiv preprint arXiv:2507.17489, 2025. 2

  78. [78]

    Proximal dehaze-net: A prior learning-based deep network for single image dehazing

    Dong Yang and Jian Sun. Proximal dehaze-net: A prior learning-based deep network for single image dehazing. In Proceedings of the european conference on computer vision (ECCV), pages 702–717, 2018. 3

  79. [79]

    See- ing deeply and bidirectionally: A deep learning approach for single image reflection removal

    Jie Yang, Dong Gong, Lingqiao Liu, and Qinfeng Shi. See- ing deeply and bidirectionally: A deep learning approach for single image reflection removal. InProceedings of the eu- ropean conference on computer vision (ECCV), pages 654– 669, 2018. 3, 4, 1

  80. [80]

    Nighttime dehazing with a synthetic benchmark

    Jing Zhang, Yang Cao, Zheng-Jun Zha, and Dacheng Tao. Nighttime dehazing with a synthetic benchmark. InProceed- ings of the 28th ACM international conference on multime- dia, pages 2355–2363, 2020. 6, 7

Showing first 80 references.