pith. machine review for the scientific record. sign in

arxiv: 2604.13981 · v1 · submitted 2026-04-15 · 💻 cs.CV

Recognition: unknown

HiProto: Hierarchical Prototype Learning for Interpretable Object Detection Under Low-quality Conditions

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords object detectionprototype learninginterpretabilitylow-quality imageshierarchical featurescontrastive losssemantic discriminationcomputer vision
0
0 comments X

The pith

Hierarchical prototype learning delivers competitive object detection in low-quality images with built-in interpretability from class-centered feature associations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that extending prototype learning into a hierarchy across feature levels lets detectors associate image regions with representative class semantics even when quality drops due to darkness, fog or similar degradations. This replaces the common routes of first enhancing the input image or stacking more complex network modules, and instead uses targeted losses to sharpen focus on relevant areas, maintain separation between class prototypes, and generate scale-consistent pseudo labels. A sympathetic reader would care because the resulting system reports both detection scores and visible prototype responses that explain why a region triggered a particular class label. The approach therefore offers a direct route to more trustworthy detectors for settings where image quality varies unpredictably.

Core claim

HiProto builds structured prototype representations at multiple feature levels to capture class-specific semantics. It does so by applying a Region-to-Prototype Contrastive Loss that pulls prototypes toward target regions, a Prototype Regularization Loss that increases separation among class prototypes, and a Scale-aware Pseudo Label Generation Strategy that prevents mismatched supervision from reaching lower-level prototypes. On ExDark, RTTS and VOC2012-FOG the resulting detector reaches competitive accuracy while exposing clear prototype responses for each prediction.

What carries the argument

Hierarchical prototype representations that associate multi-scale features with class-centered semantics to improve both discrimination and interpretability.

If this is right

  • Detection accuracy stays competitive with standard methods on low-light, rainy and foggy images without any separate enhancement step.
  • Each detection can be inspected by viewing which prototypes activate for the predicted class and region.
  • The detector works inside ordinary network backbones rather than requiring specially engineered complex architectures.
  • Lower-level prototypes remain reliable because the pseudo-label strategy blocks incorrect higher-level supervision from propagating downward.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-level prototype construction could be tested on other uncertain inputs such as motion-blurred or noisy video frames to check whether semantic stability transfers.
  • Prototype activation maps might serve as a lightweight way to audit training data for condition-specific biases before deployment.
  • If the prototypes prove reusable across datasets, they could reduce reliance on exhaustive per-condition labeling.

Load-bearing premise

That linking features to class-centered prototypes produces more stable and interpretable representations than ordinary feature maps once image quality degrades.

What would settle it

If, on ExDark, RTTS or VOC2012-FOG, removing the hierarchical structure or the contrastive loss produced no measurable drop in either detection accuracy or the clarity of prototype-based explanations for each output.

Figures

Figures reproduced from arXiv: 2604.13981 by Chaolei Yang, Jianlin Xiang, Linhui Dai, Xue Yang, Yanshan Li.

Figure 2
Figure 2. Figure 2: Overview of our proposed HiProto architecture. (a) Standard FPN-based detection architecture with decoupled heads. (b) Hierarchical prototype learning [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of prototype response maps for the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of detection results under low-light scenes. Our proposed HiProto demonstrates superior performance compared to the baseline and other [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of combined saliency maps Sˆ k (with GT bounding boxes) across all feature layers on RTTS Dataset. (a) (d) (c) (e) Baseline DNMGDT DiffDehaze SGDN MASFNet HiProto (ours) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of combined saliency maps Sˆ k (with GT bounding boxes) across all feature layers on VOC2012-FOG Dataset. ment between model responses and object regions as well as clearer foreground–background separation. Furthermore, HiProto achieves high Sparsity Scores, suggesting low redun￾dancy and strong separation among learned prototypes in the embedding space. 2) Hazy Scenes: We further evaluate HiPro… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of detection results across multiple images under hazy scenes among Baseline, HiProto, and other representative methods. Zoom in for [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Interpretability is essential for deploying object detection systems in critical applications, especially under low-quality imaging conditions that degrade visual information and increase prediction uncertainty. Existing methods either enhance image quality or design complex architectures, but often lack interpretability and fail to improve semantic discrimination. In contrast, prototype learning enables interpretable modeling by associating features with class-centered semantics, which can provide more stable and interpretable representations under degradation. Motivated by this, we propose HiProto, a new paradigm for interpretable object detection based on hierarchical prototype learning. By constructing structured prototype representations across multiple feature levels, HiProto effectively models class-specific semantics, thereby enhancing both semantic discrimination and interpretability. Building upon prototype modeling, we first propose a Region-to-Prototype Contrastive Loss (RPC-Loss) to enhance the semantic focus of prototypes on target regions. Then, we propose a Prototype Regularization Loss (PR-Loss) to improve the distinctiveness among class prototypes. Finally, we propose a Scale-aware Pseudo Label Generation Strategy (SPLGS) to suppress mismatched supervision for RPC-Loss, thereby preserving the robustness of low-level prototype representations. Experiments on ExDark, RTTS, and VOC2012-FOG demonstrate that HiProto achieves competitive results while offering clear interpretability through prototype responses, without relying on image enhancement or complex architectures. Our code will be available at https://github.com/xjlDestiny/HiProto.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes HiProto, a hierarchical prototype learning framework for interpretable object detection under low-quality imaging conditions. It builds structured prototype representations at multiple feature levels and introduces RPC-Loss to focus prototypes on target regions, PR-Loss to increase distinctiveness among class prototypes, and SPLGS to generate scale-aware pseudo-labels that stabilize low-level supervision. Experiments on ExDark, RTTS, and VOC2012-FOG are claimed to show competitive detection performance together with interpretability via prototype responses, without image enhancement or complex architectures.

Significance. If the quantitative claims hold, the work could be significant for computer vision applications that require both accuracy and interpretability in degraded conditions (low light, fog, etc.). The use of hierarchical prototypes to model class-centered semantics offers a conceptually simpler alternative to enhancement pipelines, and the explicit release of code supports reproducibility.

major comments (1)
  1. Abstract: the central claim that HiProto 'achieves competitive results' on ExDark, RTTS, and VOC2012-FOG is unsupported by any numerical metrics, baseline comparisons, ablation tables, or error analysis. This absence is load-bearing for the paper's primary contribution of effectiveness plus interpretability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the specific feedback on the abstract. We address the comment point by point below.

read point-by-point responses
  1. Referee: [—] Abstract: the central claim that HiProto 'achieves competitive results' on ExDark, RTTS, and VOC2012-FOG is unsupported by any numerical metrics, baseline comparisons, ablation tables, or error analysis. This absence is load-bearing for the paper's primary contribution of effectiveness plus interpretability.

    Authors: We agree that the abstract should be self-contained and should explicitly support the claim of competitive results with concrete numbers. The full manuscript already contains quantitative mAP comparisons against multiple baselines (including enhancement-based and prototype-based detectors), ablation studies on RPC-Loss/PR-Loss/SPLGS, and qualitative prototype-response visualizations in Sections 4.2–4.4 and the supplementary material. To address the referee’s concern directly, we will revise the abstract to include the key numerical highlights (e.g., mAP on ExDark and RTTS relative to the strongest baselines) while preserving the emphasis on interpretability. This change makes the primary contribution immediately verifiable from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an original hierarchical prototype framework with newly defined components (RPC-Loss for region focus, PR-Loss for prototype separation, and SPLGS for pseudo-label stabilization) that are motivated directly from the interpretability goal under image degradation. These are not derived from or equivalent to prior fitted quantities within the paper; performance claims rest on empirical evaluation against external benchmarks (ExDark, RTTS, VOC2012-FOG) rather than any self-referential reduction or self-citation chain. The derivation chain remains self-contained with independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Because only the abstract is available, a complete audit of free parameters, axioms, and invented entities is impossible. The central claim rests on the domain assumption that class-centered prototypes remain stable under image degradation and that the three proposed losses enforce the desired semantic properties. Typical loss-weighting hyperparameters are expected but unspecified.

free parameters (1)
  • loss weighting coefficients for RPC-Loss and PR-Loss
    Standard in multi-term training objectives; values not stated in abstract.
axioms (1)
  • domain assumption Prototype learning associates features with class-centered semantics and yields stable representations under degradation
    Explicitly stated as motivation in the abstract.

pith-pipeline@v0.9.0 · 5563 in / 1317 out tokens · 39156 ms · 2026-05-10T13:23:51.130299+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Deepdriving: Learning affordance for direct perception in autonomous driving,

    C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE international conference on computer vision , 2015, pp. 2722–2730

  2. [2]

    Multi-attention densenet: A scattering medium imaging optimization framework for visual data pre-processing of autonomous driving systems,

    P. Liu, C. Zhang, H. Qi, G. Wang, and H. Zheng, “Multi-attention densenet: A scattering medium imaging optimization framework for visual data pre-processing of autonomous driving systems,” IEEE Trans- actions on Intelligent Transportation Systems , vol. 23, no. 12, pp. 25 396–25 407, 2022

  3. [3]

    Ultra-high- definition low-light image enhancement: A benchmark and transformer- based method,

    T. Wang, K. Zhang, T. Shen, W. Luo, B. Stenger, and T. Lu, “Ultra-high- definition low-light image enhancement: A benchmark and transformer- based method,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 3, 2023, pp. 2654–2662

  4. [4]

    Inter- pretable optimization-inspired unfolding network for low-light image enhancement,

    W. Wu, J. Weng, P. Zhang, X. Wang, W. Yang, and J. Jiang, “Inter- pretable optimization-inspired unfolding network for low-light image enhancement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  5. [5]

    Learning optimized low-light image enhancement for edge vision tasks,

    S. A Sharif, A. Myrzabekov, N. Khudjaev, R. Tsoy, S. Kim, and J. Lee, “Learning optimized low-light image enhancement for edge vision tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 6373–6383

  6. [6]

    Low-light image enhancement via generative perceptual priors,

    H. Zhou, W. Dong, X. Liu, Y . Zhang, G. Zhai, and J. Chen, “Low-light image enhancement via generative perceptual priors,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 10, 2025, pp. 10 752–10 760

  7. [7]

    Nighthaze: Nighttime image dehazing via self-prior learning,

    B. Lin, Y . Jin, Y . Wending, W. Ye, Y . Yuan, and R. T. Tan, “Nighthaze: Nighttime image dehazing via self-prior learning,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 5, 2025, pp. 5209–5217

  8. [8]

    Unsupervised face detection in the dark,

    W. Wang, X. Wang, W. Yang, and J. Liu, “Unsupervised face detection in the dark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1250–1266, 2022

  9. [9]

    Edge-guided rep- resentation learning for underwater object detection,

    L. Dai, H. Liu, P. Song, H. Tang, R. Ding, and S. Li, “Edge-guided rep- resentation learning for underwater object detection,” CAAI Transactions on Intelligence Technology, vol. 9, no. 5, pp. 1078–1091, 2024

  10. [10]

    This looks like that: deep learning for interpretable image recognition,

    C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: deep learning for interpretable image recognition,” Advances in neural information processing systems , vol. 32, 2019

  11. [11]

    Csc-pa: Cross-image semantic correlation via prototype attentions for single-network semi- supervised breast tumor segmentation,

    Z. Ding, G. Chen, Q. Zhang, H. Wu, and J. Qin, “Csc-pa: Cross-image semantic correlation via prototype attentions for single-network semi- supervised breast tumor segmentation,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 15 632–15 641

  12. [12]

    Prototype-based image prompting for weakly supervised histopathological image segmentation,

    Q. Tang, L. Fan, M. Pagnucco, and Y . Song, “Prototype-based image prompting for weakly supervised histopathological image segmentation,” in Proceedings of the Computer Vision and Pattern Recognition Confer- ence, 2025, pp. 30 271–30 280

  13. [13]

    Weakly supervised semantic segmentation by pixel-to-prototype contrast,

    Y . Du, Z. Fu, Q. Liu, and Y . Wang, “Weakly supervised semantic segmentation by pixel-to-prototype contrast,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4320–4329

  14. [14]

    Universal-prototype enhancing for few-shot object detection,

    A. Wu, Y . Han, L. Zhu, and Y . Yang, “Universal-prototype enhancing for few-shot object detection,” in Proceedings of the IEEE/CVF inter- national conference on computer vision , 2021, pp. 9567–9576

  15. [15]

    Few-shot object detection by attend- ing to per-sample-prototype,

    H. Lee, M. Lee, and N. Kwak, “Few-shot object detection by attend- ing to per-sample-prototype,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2022, pp. 2445–2454

  16. [16]

    Aglldiff: Guiding diffusion models towards unsupervised training-free real-world low-light image enhancement,

    Y . Lin, T. Ye, S. Chen, Z. Fu, Y . Wang, W. Chai, Z. Xing, W. Li, L. Zhu, and X. Ding, “Aglldiff: Guiding diffusion models towards unsupervised training-free real-world low-light image enhancement,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 5, 2025, pp. 5307–5315

  17. [17]

    Hvi: A new color space for low-light image enhancement,

    Q. Yan, Y . Feng, C. Zhang, G. Pang, K. Shi, P. Wu, W. Dong, J. Sun, and Y . Zhang, “Hvi: A new color space for low-light image enhancement,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5678–5687

  18. [18]

    Beyond spatial domain: Cross-domain promoted fourier convolution helps single image dehazing,

    X. Zhang, H. Ding, F. Xie, L. Pan, Y . Zi, K. Wang, and H. Zhang, “Beyond spatial domain: Cross-domain promoted fourier convolution helps single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 10, 2025, pp. 10 221–10 229

  19. [19]

    Exploiting diffusion prior for real-world image dehazing with unpaired training,

    Y . Lan, Z. Cui, C. Liu, J. Peng, N. Wang, X. Luo, and D. Liu, “Exploiting diffusion prior for real-world image dehazing with unpaired training,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 4, 2025, pp. 4455–4463

  20. [20]

    Prior-guided hierarchical harmonization net- work for efficient image dehazing,

    X. Su, S. Li, Y . Cui, M. Cao, Y . Zhang, Z. Chen, Z. Wu, Z. Wang, Y . Zhang, and X. Yuan, “Prior-guided hierarchical harmonization net- work for efficient image dehazing,” in Proceedings of the AAAI Confer- ence on Artificial Intelligence , vol. 39, no. 7, 2025, pp. 7042–7050

  21. [21]

    Dehaze-retinexgan: Real-world image dehazing via retinex-based generative adversarial network,

    X. Wang, G. Yang, T. Ye, and Y . Liu, “Dehaze-retinexgan: Real-world image dehazing via retinex-based generative adversarial network,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 8, 2025, pp. 7997–8005. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12

  22. [22]

    Hla-face: Joint high-low adaptation for low light face detection,

    W. Wang, W. Yang, and J. Liu, “Hla-face: Joint high-low adaptation for low light face detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 16 195–16 204

  23. [23]

    Multitask aet with orthogonal tangent regularity for dark object detection,

    Z. Cui, G.-J. Qi, L. Gu, S. You, Z. Zhang, and T. Harada, “Multitask aet with orthogonal tangent regularity for dark object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 2553–2562

  24. [24]

    Dsnet: Joint semantic learning for object detection in inclement weather conditions,

    S.-C. Huang, T.-H. Le, and D.-W. Jaw, “Dsnet: Joint semantic learning for object detection in inclement weather conditions,” IEEE transactions on pattern analysis and machine intelligence , vol. 43, no. 8, pp. 2623– 2633, 2020

  25. [25]

    Image-adaptive yolo for object detection in adverse weather conditions,

    W. Liu, G. Ren, R. Yu, S. Guo, J. Zhu, and L. Zhang, “Image-adaptive yolo for object detection in adverse weather conditions,” in Proceedings of the AAAI conference on artificial intelligence , vol. 36, no. 2, 2022, pp. 1792–1800

  26. [26]

    Featen- hancer: Enhancing hierarchical features for object detection and beyond under low-light vision,

    K. A. Hashmi, G. Kallempudi, D. Stricker, and M. Z. Afzal, “Featen- hancer: Enhancing hierarchical features for object detection and beyond under low-light vision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 6725–6735

  27. [27]

    Trash to treasure: Low-light object detection via decomposition-and-aggregation,

    X. Cui, L. Ma, T. Ma, J. Liu, X. Fan, and R. Liu, “Trash to treasure: Low-light object detection via decomposition-and-aggregation,” in Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 2, 2024, pp. 1417–1425

  28. [28]

    You only look around: Learning illumination-invariant feature for low-light object detection,

    M. Hong, S. Cheng, H. Huang, H. Fan, and S. Liu, “You only look around: Learning illumination-invariant feature for low-light object detection,” Advances in Neural Information Processing Systems, vol. 37, pp. 87 136–87 158, 2024

  29. [29]

    Deformable protopnet: An in- terpretable image classifier using deformable prototypes,

    J. Donnelly, A. J. Barnett, and C. Chen, “Deformable protopnet: An in- terpretable image classifier using deformable prototypes,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 265–10 275

  30. [30]

    Pip-net: Patch-based intuitive prototypes for interpretable image classification,

    M. Nauta, J. Schl ¨otterer, M. Van Keulen, and C. Seifert, “Pip-net: Patch-based intuitive prototypes for interpretable image classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2744–2753

  31. [31]

    Mcpnet: An interpretable classifier via multi-level concept prototypes,

    B.-S. Wang, C.-Y . Wang, and W.-C. Chiu, “Mcpnet: An interpretable classifier via multi-level concept prototypes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 10 885–10 894

  32. [32]

    Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation,

    Q. Chen, L. Yang, J.-H. Lai, and X. Xie, “Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4288–4298

  33. [33]

    Hunting attributes: Context prototype-aware learning for weakly supervised semantic seg- mentation,

    F. Tang, Z. Xu, Z. Qu, W. Feng, X. Jiang, and Z. Ge, “Hunting attributes: Context prototype-aware learning for weakly supervised semantic seg- mentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 3324–3334

  34. [34]

    Multi-label prototype visual spatial search for weakly supervised semantic segmentation,

    S. Duan, X. Yang, and N. Wang, “Multi-label prototype visual spatial search for weakly supervised semantic segmentation,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 30 241–30 250

  35. [35]

    Break- ing immutable: Information-coupled prototype elaboration for few-shot object detection,

    X. Lu, W. Diao, Y . Mao, J. Li, P. Wang, X. Sun, and K. Fu, “Break- ing immutable: Information-coupled prototype elaboration for few-shot object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1844–1852

  36. [36]

    Few-shot cross-domain object detection with instance-level prototype-based meta-learning,

    L. Zhang, B. Zhang, B. Shi, J. Fan, and T. Chen, “Few-shot cross-domain object detection with instance-level prototype-based meta-learning,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 34, no. 10, pp. 9078–9089, 2024

  37. [37]

    Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,

    X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” Advances in neural information processing systems, vol. 33, pp. 21 002–21 012, 2020

  38. [38]

    Mbllen: Low-light image/video enhancement using cnns

    F. Lv, F. Lu, J. Wu, and C. Lim, “Mbllen: Low-light image/video enhancement using cnns.” in Bmvc, vol. 220, no. 1. Northumbria University, 2018, p. 4

  39. [39]

    Kindling the darkness: A practical low-light image enhancer,

    Y . Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” inProceedings of the 27th ACM international conference on multimedia , 2019, pp. 1632–1640

  40. [40]

    Zero- reference deep curve estimation for low-light image enhancement,

    C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, “Zero- reference deep curve estimation for low-light image enhancement,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 1780–1789

  41. [41]

    Toward fast, flexible, and robust low-light image enhancement,

    L. Ma, T. Ma, R. Liu, X. Fan, and Z. Luo, “Toward fast, flexible, and robust low-light image enhancement,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5637– 5646

  42. [42]

    Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,

    Y . Cai, H. Bian, J. Lin, H. Wang, R. Timofte, and Y . Zhang, “Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,” in Proceedings of the IEEE/CVF international conference on computer vision , 2023, pp. 12 504–12 513

  43. [43]

    Selective hourglass mapping for universal image restoration based on diffusion model,

    D. Zheng, X.-M. Wu, S. Yang, J. Zhang, J.-F. Hu, and W.-S. Zheng, “Selective hourglass mapping for universal image restoration based on diffusion model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 25 445–25 455

  44. [44]

    Diff-retinex++: Retinex- driven reinforced diffusion model for low-light image enhancement,

    X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Diff-retinex++: Retinex- driven reinforced diffusion model for low-light image enhancement,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025

  45. [45]

    Real scene single image dehazing network with multi-prior guidance and domain transfer,

    Y . Su, N. Wang, Z. Cui, Y . Cai, C. He, and A. Li, “Real scene single image dehazing network with multi-prior guidance and domain transfer,” IEEE Transactions on Multimedia , 2025

  46. [46]

    Learning hazing to dehazing: Towards realistic haze generation for real- world image dehazing,

    R. Wang, Y . Zheng, Z. Zhang, C. Li, S. Liu, G. Zhai, and X. Liu, “Learning hazing to dehazing: Towards realistic haze generation for real- world image dehazing,” in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 23 091–23 100

  47. [47]

    Guided real image dehazing using ycbcr color space,

    W. Fang, J. Fan, Y . Zheng, J. Weng, Y . Tai, and J. Li, “Guided real image dehazing using ycbcr color space,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 39, no. 3, 2025, pp. 2906– 2914

  48. [48]

    Masfnet: Multi- scale adaptive sampling fusion network for object detection in adverse weather,

    Z. Liu, T. Fang, H. Lu, W. Zhang, and R. Lan, “Masfnet: Multi- scale adaptive sampling fusion network for object detection in adverse weather,” IEEE Transactions on Geoscience and Remote Sensing , 2025

  49. [49]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision . Springer, 2020, pp. 213– 229

  50. [50]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020

  51. [51]

    Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks,

    L. Huang, X. Liu, B. Lang, A. Yu, Y . Wang, and B. Li, “Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 32, no. 1, 2018

  52. [52]

    Learning orthogonal prototypes for generalized few-shot semantic segmentation,

    S.-A. Liu, Y . Zhang, Z. Qiu, H. Xie, Y . Zhang, and T. Yao, “Learning orthogonal prototypes for generalized few-shot semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 11 319–11 328

  53. [53]

    Training deep net- works with structured layers by matrix backpropagation,

    C. Ionescu, O. Vantzos, and C. Sminchisescu, “Training deep net- works with structured layers by matrix backpropagation,” arXiv preprint arXiv:1509.07838, 2015

  54. [54]

    3d roc analysis for medical imaging diagnosis,

    S. Wang, C.-I. Chang, S.-C. Yang, G.-C. Hsu, H.-H. Hsu, P.-C. Chung, S.-M. Guo, and S.-K. Lee, “3d roc analysis for medical imaging diagnosis,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. IEEE, 2006, pp. 7545–7548

  55. [55]

    Getting to know low-light images with the exclusively dark dataset,

    Y . P. Loh and C. S. Chan, “Getting to know low-light images with the exclusively dark dataset,” Computer vision and image understanding , vol. 178, pp. 30–42, 2019

  56. [56]

    Benchmarking single-image dehazing and beyond,

    B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE transactions on image processing , vol. 28, no. 1, pp. 492–505, 2018

  57. [57]

    The pascal visual object classes (voc) challenge,

    M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser- man, “The pascal visual object classes (voc) challenge,” International journal of computer vision , vol. 88, no. 2, pp. 303–338, 2010