pith. sign in

arxiv: 2605.21964 · v1 · pith:7QYPGGG6new · submitted 2026-05-21 · 💻 cs.CV · physics.optics

Dual-Integrated Low-Latency Single-Lens Infrared Computational Imaging for Object Detection

Pith reviewed 2026-05-22 06:57 UTC · model grok-4.3

classification 💻 cs.CV physics.optics
keywords computational imaginginfrared imagingobject detectionlow-latencyphysics-informedsingle-lens cameraimage reconstructionYOLO
0
0 comments X

The pith

PDI-Net merges reconstruction and detection in a single pipeline for low-latency infrared object detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PDI-Net to combine infrared image reconstruction and object detection into one network for compact single-lens cameras. It trains with a full U-Net but runs inference with a semi-U-Net encoder that shares features directly with a YOLO detector, skipping full image reconstruction. Optical priors from field-dependent point spread functions are embedded through a PALS-Bridge module to keep detection accuracy high. This matters for resource-limited platforms where separate reconstruction and detection steps create too much delay, and where multi-lens infrared systems add weight. If the integration works, it supports real-time detection in lighter, cheaper infrared hardware for surveillance or navigation tasks.

Core claim

PDI-Net integrates infrared reconstruction with object detection by using a supervised U-Net only during training and a semi-U-Net encoder that shares multiscale features directly with a YOLO-based detector at inference time. A physics-aware large-small bridge (PALS-Bridge) uses field-dependent point spread function priors to adaptively modulate convolutional branches and bridge fidelity-oriented features with detection semantics. A physics-informed optical degradation simulation pipeline generates training data. On the M3FD benchmark under low-SNR conditions, the approach reduces inference time by 84.06 percent versus a pruned Rec+Det baseline while improving mAP@0.5:0.95 by 5.07 percent. A

What carries the argument

The physics-aware large-small bridge (PALS-Bridge), which modulates multiscale convolutional branches with field-dependent point spread function priors to adapt reconstruction features for detection without full image output.

If this is right

  • Single-lens infrared cameras can weigh about 50 percent less than traditional multi-lens designs while supporting real-time detection.
  • Inference latency drops enough for deployment on resource-constrained platforms without separate reconstruction steps.
  • Detection accuracy holds or improves under low signal-to-noise conditions compared to pruned reconstruction-plus-detection pipelines.
  • The method avoids reconstructing full images at test time by sharing encoder features directly with the detector.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-integration pattern could apply to other wavelength ranges or sensor types where reconstruction latency limits real-time use.
  • Further tests with varied lens aberrations would show how far the PALS-Bridge priors generalize beyond the simulation pipeline.
  • Hardware prototypes could reveal whether the reported weight savings translate to improved battery life or portability in field deployments.

Load-bearing premise

The physics-informed optical degradation simulation and field-dependent PSF priors in PALS-Bridge accurately represent real single-lens infrared degradations and preserve detection-critical information during feature adaptation.

What would settle it

Measure whether PDI-Net maintains its reported mAP gain and latency reduction when run on raw images from an actual single-lens infrared prototype camera under the same low-SNR conditions used in the M3FD tests.

Figures

Figures reproduced from arXiv: 2605.21964 by Dapeng Yan, Guishuo Yang, Jiande Sun, Kai Zhang, Xinbin Cheng, Xiong Dun, Xuanyu Qian, Xuquan Wang, Yujie Xing, Zhanshan Wang.

Figure 1
Figure 1. Figure 1: A comparative illustration of infrared object detection methodologies employing distinct imaging strategies. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed PDI-Net for low-latency single-lens infrared computational imaging. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Feature discrepancy between reconstruction and detection modules. (b) Detailed architecture of the PALS-Bridge. (c) The partitioned PSF pattern [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Simulation-based dataset generation process for single-lens infrared computational imaging cameras. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of infrared object detection with different imaging strategies. (a) Traditional imaging. (b) Rec+Det. (c) Rec+Det with pruning. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Heatmap comparison of different infrared imaging strategies. (a) Ground truth. (b) Traditional imaging. (c) Rec+Det. (d) Rec+Det with pruning. (e) [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study and sub-connection methods of the feature-sharing layer. (a) Strategy.1: the first ConvBlock is used as the feature-sharing layer. (b) [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison with traditional imaging frameworks on the M [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison with traditional imaging frameworks on the FLIR [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detection results of the proposed PDI-Net with 50% uniform structured pruning and INT8 precision quantization, visualized on the raw and [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Imaging system comparison and UAV integration. (a) Traditional multi-lens infrared camera (700 g). (b) Proposed single-lens camera (372 g). (c), [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Computational imaging enables compact infrared systems, but deep-learning pipelines that combine image reconstruction and object detection often introduce substantial inference latency. Most existing acceleration strategies compress the reconstruction network while overlooking physical priors from the optical path, leaving a trade-off between accuracy and speed. We present Physics-aware Dual-Integrated Network (PDI-Net), a low-latency framework that integrates infrared reconstruction with object detection and further embeds optical priors into the learning process. PDI-Net uses a supervised U-Net during training, while a semi-U-Net encoder shares features directly with a YOLO-based detector during inference, avoiding full image reconstruction. To bridge the gap between fidelity-oriented reconstruction features and detection-oriented semantics, we introduce a physics-aware large-small bridge (PALS-Bridge), which uses field-dependent point spread function priors to adaptively modulate multiscale convolutional branches. A physics-informed optical degradation simulation pipeline is also developed for training and validation. The method is deployed on a single-lens infrared camera, reducing system weight by about 50% compared with traditional multi-lens designs. On the M3FD benchmark under low-SNR conditions, PDI-Net reduces inference time by 84.06% compared with the Rec+Det with pruning strategy while improving mAP@0.5:0.95 by 5.07%. These results demonstrate compact, low-latency computational infrared imaging for real-time object detection on resource-constrained platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents PDI-Net, a physics-aware dual-integrated network for low-latency single-lens infrared computational imaging for object detection. It employs a supervised U-Net during training but switches to a semi-U-Net encoder that shares multiscale features directly with a YOLO-based detector at inference, bypassing full image reconstruction. Optical priors are embedded via the PALS-Bridge module, which uses field-dependent PSF priors to modulate large-small convolutional branches, supported by a physics-informed optical degradation simulation pipeline for training and validation. The system is deployed on a single-lens IR camera (claiming ~50% weight reduction vs. multi-lens designs). On the M3FD benchmark under low-SNR conditions, it reports an 84.06% inference-time reduction relative to a pruned Rec+Det baseline while improving mAP@0.5:0.95 by 5.07%.

Significance. If the simulation pipeline and PALS-Bridge successfully preserve detection-critical high-frequency cues under realistic single-lens IR degradations, the work would offer a practical route to compact, real-time computational IR systems on resource-constrained platforms. The explicit integration of optical priors into the feature-adaptation stage and the training-to-inference architectural split are strengths that could reduce latency without sacrificing accuracy in reconstruction-detection pipelines.

major comments (1)
  1. Abstract and Deployment Statement: The central quantitative claims (84.06% latency reduction and 5.07% mAP@0.5:0.95 gain on M3FD low-SNR) rest on the premise that the physics-informed optical degradation simulation pipeline plus field-dependent PSF priors in PALS-Bridge produce detection-ready features without full reconstruction. The text provides no evidence that real captured PSFs, image pairs, or hardware measurements from the deployed single-lens camera were used to calibrate or validate the simulation; if the modeled aberrations deviate from physical optics, the multiscale modulation may discard cues that the reported mAP improvement assumes are retained.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address the major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the presentation of our simulation pipeline and deployment claims.

read point-by-point responses
  1. Referee: Abstract and Deployment Statement: The central quantitative claims (84.06% latency reduction and 5.07% mAP@0.5:0.95 gain on M3FD low-SNR) rest on the premise that the physics-informed optical degradation simulation pipeline plus field-dependent PSF priors in PALS-Bridge produce detection-ready features without full reconstruction. The text provides no evidence that real captured PSFs, image pairs, or hardware measurements from the deployed single-lens camera were used to calibrate or validate the simulation; if the modeled aberrations deviate from physical optics, the multiscale modulation may discard cues that the reported mAP improvement assumes are retained.

    Authors: We appreciate the referee highlighting this important clarification needed in our presentation. The physics-informed optical degradation simulation pipeline relies on modeled field-dependent PSF priors generated from the optical design parameters of the single-lens infrared camera (using standard ray-tracing and diffraction models), rather than direct calibration against real captured PSFs or paired hardware measurements. This simulation-based approach is employed because obtaining large-scale, precisely registered real IR image pairs under varying low-SNR conditions with the exact single-lens setup is practically challenging and not always feasible for training deep networks. The PALS-Bridge uses these priors to adaptively modulate multiscale features, and all reported mAP and latency results are obtained by applying the simulated degradations to the M3FD benchmark for controlled, reproducible evaluation. The deployment claim in the abstract refers specifically to the physical single-lens camera hardware achieving the ~50% weight reduction, which was verified through system integration and weight measurements, independent of the end-to-end detection metrics. We agree that the manuscript should more explicitly distinguish between simulation for algorithmic validation and hardware for system-level benefits to avoid any implication of direct real-PSF calibration. In the revised manuscript, we will update the abstract, add a new subsection detailing the PSF prior generation process (including optical parameters used), and explicitly state the simulation-based nature of the performance evaluation. This revision will directly address concerns about potential deviation from physical optics and the retention of detection-critical cues. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The PDI-Net framework integrates a supervised U-Net for training with a semi-U-Net encoder sharing features directly to a YOLO-based detector at inference, augmented by the PALS-Bridge module that applies field-dependent PSF priors to modulate multiscale branches and a separate physics-informed optical degradation simulation pipeline. These components rely on standard, externally established architectures (U-Net, YOLO) and optical priors motivated outside the present work rather than any internal fitting or self-referential definition. Reported gains (84.06% latency reduction and 5.07% mAP improvement on M3FD low-SNR) are framed as empirical deployment results on a single-lens camera, not as quantities forced by construction from the equations or prior self-citations. The derivation chain therefore remains self-contained with independent content from the network topology and physics priors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The central performance claims rest on the accuracy of the physics-informed simulation and the effectiveness of the newly introduced PALS-Bridge; these are domain assumptions without independent falsifiable evidence supplied in the abstract.

axioms (2)
  • domain assumption Field-dependent point spread functions from the single-lens optical path can be used to adaptively modulate multiscale features and bridge reconstruction-oriented and detection-oriented representations.
    Invoked to justify the PALS-Bridge design and its expected benefit.
  • domain assumption A physics-informed optical degradation simulation pipeline produces training data sufficiently representative of real single-lens infrared camera behavior.
    Used for both training and validation of the network.
invented entities (3)
  • PDI-Net no independent evidence
    purpose: Dual-integrated low-latency framework combining reconstruction and detection
    New overall architecture proposed in the paper.
  • PALS-Bridge no independent evidence
    purpose: Physics-aware module that modulates multiscale convolutional branches using PSF priors
    Introduced specifically to address the feature gap between reconstruction and detection.
  • physics-informed optical degradation simulation pipeline no independent evidence
    purpose: Generate realistic degraded infrared images for training and validation
    Developed to support the supervised training of the network.

pith-pipeline@v0.9.0 · 5816 in / 1614 out tokens · 57824 ms · 2026-05-22T06:57:22.813672+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 4 internal anchors

  1. [1]

    Exploring video denoising in thermal infrared imaging: Physics-inspired noise generator, dataset, and model,

    L. Cai, X. Dong, K. Zhou, and X. Cao, “Exploring video denoising in thermal infrared imaging: Physics-inspired noise generator, dataset, and model,”IEEE Trans. Image Process., vol. 33, pp. 3839–3854, 2024

  2. [2]

    Assessment for crop water stress with infrared thermal imagery in precision agri- culture: A review and future prospects for deep learning applications,

    Z. Zhou, Y . Majeed, G. D. Naranjo, and E. M. Gambacorta, “Assessment for crop water stress with infrared thermal imagery in precision agri- culture: A review and future prospects for deep learning applications,” Computers and Electronics in Agriculture, vol. 182, p. 106019, 2021

  3. [3]

    Object detection from uav thermal infrared images and videos using yolo models,

    C. Jiang, H. Ren, X. Ye, J. Zhu, H. Zeng, Y . Nan, M. Sun, X. Ren, and H. Huo, “Object detection from uav thermal infrared images and videos using yolo models,”International Journal of Applied Earth Observation and Geoinformation, vol. 112, p. 102912, 2022

  4. [4]

    Miniaturization of optical spectrometers,

    Z. Yang, T. Albrow-Owen, W. Cai, and T. Hasan, “Miniaturization of optical spectrometers,”Science, vol. 371, no. 6528, p. eabe0722, 2021

  5. [5]

    Laskin,Basics of Optics on Imaging Quality and Aberrations

    A. Laskin,Basics of Optics on Imaging Quality and Aberrations. Springer International Publishing, 2021, pp. 545–598

  6. [6]

    High-quality computational imaging through simple lenses,

    F. Heide, M. Rouf, M. B. Hullin, B. Labitzke, W. Heidrich, and A. Kolb, “High-quality computational imaging through simple lenses,” ACM Trans. Graph., vol. 32, no. 5, p. 149, 2013

  7. [7]

    Bhandari, A

    A. Bhandari, A. Kadambi, and R. Raskar,Computational Imaging. MIT Press, 2022

  8. [8]

    Research advances in simple and compact,

    Y .-H. Liu, T.-X. Qin, Y .-C. Wang, X.-W. Kang, J. Liu, J.-C. Wu, and L.-C. Cao, “Research advances in simple and compact,”Acta Phys. Sin., vol. 72, no. 8, 2023

  9. [9]

    Lensless computational imaging through deep learning,

    A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,”Optica, vol. 4, no. 9, pp. 1117–1125, 2017

  10. [10]

    Computational imaging and artificial intelligence: The next revolution of mobile vision,

    J. Suo, W. Zhang, J. Gong, X. Yuan, D. J. Brady, and Q. Dai, “Computational imaging and artificial intelligence: The next revolution of mobile vision,”Proc. IEEE, vol. 111, no. 12, pp. 1607–1639, 2023

  11. [11]

    End-to- end learned single lens design using improved wiener deconvolution,

    R. Zhang, F. Tan, Q. Hou, Z. Li, Z. Sun, C. Yang, and X. Gao, “End-to- end learned single lens design using improved wiener deconvolution,” Opt. Lett., vol. 48, no. 3, pp. 522–525, 2023

  12. [12]

    Bian and Q

    L. Bian and Q. Dai,Computational Imaging and Sensing. Beijing: Post & Telecom Press, 2022

  13. [13]

    Computational optical imaging: An overview,

    C. Zuo and Q. Chen, “Computational optical imaging: An overview,” Infrared Laser Eng., vol. 51, no. 2, p. 20220110, 2022

  14. [14]

    Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,

    X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y . Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,”Optica, vol. 7, no. 8, pp. 913–922, 2020

  15. [15]

    Lightridge: an end-to-end agile design framework for diffractive optical neural networks,

    Y . Li, R. Chen, M. Lou, B. Sensale-Rodriguez, W. Gao, and C. Yu, “Lightridge: an end-to-end agile design framework for diffractive optical neural networks,” inProc. ACM Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), vol. 4, 2023, pp. 202–218

  16. [16]

    OpEnCam: Lensless optical encryption camera,

    S. S. Khan, X. Yu, K. Mitra, M. Chandraker, and F. Pittaluga, “OpEnCam: Lensless optical encryption camera,”IEEE Trans. Comput. Imaging, vol. 10, pp. 1306–1316, 2024

  17. [17]

    Flatcam: Thin, lensless cameras using coded aperture and computation,

    M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan, and R. G. Baraniuk, “Flatcam: Thin, lensless cameras using coded aperture and computation,”IEEE Trans. Comput. Imaging, vol. 3, no. 3, pp. 384– 397, 2016

  18. [18]

    Diffusercam: lensless single-exposure 3d imaging,

    N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “Diffusercam: lensless single-exposure 3d imaging,”Optica, vol. 5, no. 1, pp. 1–9, 2017

  19. [19]

    Single image haze removal using dark channel prior,

    K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2341–2353, 2010

  20. [20]

    Snapshot spectral compressive imaging reconstruction using convolution and contextual transformer,

    L. Wang, Z. Wu, Y . Zhong, and X. Yuan, “Snapshot spectral compressive imaging reconstruction using convolution and contextual transformer,” Photon. Res., vol. 10, no. 8, pp. 1848–1858, Aug 2022. [Online]. Available: https://opg.optica.org/prj/abstract.cfm?URI=prj-10-8-1848

  21. [21]

    Model compression and hardware acceleration for neural networks: A comprehensive survey,

    L. Deng, G. Li, S. Han, L. Shi, and Y . Xie, “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proc. IEEE, vol. 108, no. 4, pp. 485–532, 2020

  22. [22]

    Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

    S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,”arXiv preprint arXiv:1510.00149, 2015

  23. [23]

    Network quantization with element- wise gradient scaling,

    J. Lee, D. Kim, and B. Ham, “Network quantization with element- wise gradient scaling,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6448–6457. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

  24. [24]

    Quantization and training of neural networks for efficient integer-arithmetic-only inference,

    B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704– 2713

  25. [25]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  26. [26]

    Efficient neural architecture search via parameters sharing,

    H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” inInternational conference on machine learning. PMLR, 2018, pp. 4095–4104

  27. [27]

    Real-time high-quality single-lens computational imaging via enhancing lens modulation transfer function consistency,

    Y . Xing, X. Wang, X. Dun, J. Zhang, J. Yu, W. Huang, Z. Wang, and X. Cheng, “Real-time high-quality single-lens computational imaging via enhancing lens modulation transfer function consistency,”Opt. Express, vol. 33, no. 3, pp. 5179–5190, 2025

  28. [28]

    Physics-informed neural network enables high-frame-rate single-lens computational imaging,

    Y . Xing, X. Wang, J. Zhang, X. Qian, D. Yang, X. Dun, Z. Wang, and X. Cheng, “Physics-informed neural network enables high-frame-rate single-lens computational imaging,”Chinese Optics Letters, vol. 23, no. 11, p. 121101, 12 2025. [Online]. Available: https://m.researching.cn/articles/OJcd8c96e5ba2a08d7

  29. [29]

    Mwr-net: An edge-oriented lightweight framework for image restoration in single-lens infrared computational imaging,

    X. Qian, X. Wang, Y . Xing, G. Yang, X. Dun, Z. Wang, and X. Cheng, “Mwr-net: An edge-oriented lightweight framework for image restoration in single-lens infrared computational imaging,” Remote Sensing, vol. 17, no. 17, 2025. [Online]. Available: https: //www.mdpi.com/2072-4292/17/17/3005

  30. [30]

    Edge accelerated reconstruction using sensitivity analysis for single- lens computational imaging,

    X. Wang, T. Feng, Y . Xing, Z. Zhao, X. Dun, Z. Wang, and X. Cheng, “Edge accelerated reconstruction using sensitivity analysis for single- lens computational imaging,”Adv. Imaging, vol. 2, no. 3, 2025

  31. [31]

    Rethinking image restoration for object detection,

    S. Sun, W. Ren, T. Wang, and X. Cao, “Rethinking image restoration for object detection,”Adv. Neural Inf. Process. Syst., vol. 35, pp. 4461– 4474, 2022

  32. [32]

    Distinctive image features from scale-invariant keypoints,

    D. G. Low, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004

  33. [33]

    Histograms of oriented gradients for human detection,

    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1, 2005, pp. 886–893

  34. [34]

    Rich feature hierarchies for accurate object detection and semantic segmentation,

    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 580–587

  35. [35]

    Fast R-CNN,

    R. Girshick, “Fast R-CNN,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 1440–1448

  36. [36]

    When Image Denoising Meets High-Level Vision Tasks: A Deep Learning Approach

    D. Liu, B. Wen, X. Liu, Z. Wang, and T. S. Huang, “When image denoising meets high-level vision tasks: A deep learning approach,” arXiv preprint arXiv:1706.04284, 2017

  37. [37]

    Generative adversarial nets,

    I. Goodfellowet al., “Generative adversarial nets,”Proc. NIPS, pp. 2672–2680, 2014

  38. [38]

    Cross-resolution semi-supervised adversarial learning for pansharpening,

    G. Yang, K. Zhang, F. Zhang, J. Wang, and J. Sun, “Cross-resolution semi-supervised adversarial learning for pansharpening,”IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–17, 2023

  39. [39]

    Denoising prior driven deep neural network for image restoration,

    W. Dong, P. Wang, W. Yin, G. Shi, F. Wu, and X. Lu, “Denoising prior driven deep neural network for image restoration,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 10, pp. 2305–2318, 2018

  40. [40]

    Rethinking deep image prior for denoising,

    Y . Jo, S. Y . Chun, and J. Choi, “Rethinking deep image prior for denoising,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 5087–5096

  41. [41]

    Image deblurring by exploring in-depth properties of transformer,

    P. Liang, J. Jiang, X. Liu, and J. Ma, “Image deblurring by exploring in-depth properties of transformer,”IEEE Trans. Neural Netw. Learn. Syst., 2024

  42. [42]

    Dehazenet: An end-to-end system for single image haze removal,

    B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,”IEEE Trans. Image Process., vol. 25, no. 11, pp. 5187–5198, 2016

  43. [43]

    Beyond dehazing: Learning intrinsic hazy robustness for aerial object detection,

    Q. Hu, Y . Zhang, R. Zhang, F. Xu, and W. Yang, “Beyond dehazing: Learning intrinsic hazy robustness for aerial object detection,”IEEE Trans. Geosci. Remote Sens., 2024

  44. [44]

    From rain generation to rain removal,

    H. Wang, Z. Yue, Q. Xie, Q. Zhao, Y . Zheng, and D. Meng, “From rain generation to rain removal,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 791–14 801

  45. [45]

    Compressive hyperspectral target detection with restricted distribution property,

    Q. Yang, X. Wang, D. Wang, B. Yu, Y . Zhou, and S. Qiao, “Compressive hyperspectral target detection with restricted distribution property,”IEEE Trans. Geosci. Remote Sens., 2024

  46. [46]

    Bi-branch multiscale feature joint network for orsi salient object detection in adverse weather conditions,

    J. Yuan, X. Zou, H. Xia, T. Liu, and F. Wu, “Bi-branch multiscale feature joint network for orsi salient object detection in adverse weather conditions,”IEEE Trans. Geosci. Remote Sens., 2024

  47. [47]

    Aod-net: All-in-one dehazing network,

    B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net: All-in-one dehazing network,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 4770–4778

  48. [48]

    Detection- friendly dehazing: Object detection in real-world hazy scenes,

    C. Li, H. Zhou, Y . Liu, C. Yang, Y . Xie, Z. Li, and L. Zhu, “Detection- friendly dehazing: Object detection in real-world hazy scenes,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 7, pp. 8284–8295, 2023

  49. [49]

    Darkvisionnet: Low-light imaging via rgb-nir fusion with deep inconsistency prior,

    S. Jin, B. Yu, M. Jing, Y . Zhou, J. Liang, and R. Ji, “Darkvisionnet: Low-light imaging via rgb-nir fusion with deep inconsistency prior,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 1, 2022, pp. 1104–1112

  50. [50]

    Learning deep multiscale local dissimilarity prior for pansharpening,

    K. Zhang, G. Yang, F. Zhang, W. Wan, M. Zhou, J. Sun, and H. Zhang, “Learning deep multiscale local dissimilarity prior for pansharpening,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–15, 2023

  51. [51]

    Multitask aet with orthogonal tangent regularity for dark object detection,

    Z. Cui, G.-J. Qi, L. Gu, S. You, Z. Zhang, and T. Harada, “Multitask aet with orthogonal tangent regularity for dark object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 2553–2562

  52. [52]

    Multiscale domain adaptive yolo for cross- domain object detection,

    M. Hnewa and H. Radha, “Multiscale domain adaptive yolo for cross- domain object detection,” inProc. IEEE Int. Conf. Image Process. (ICIP). IEEE, 2021, pp. 3323–3327

  53. [53]

    Restorex-ai: A contrastive approach towards guiding image restoration via explainable ai systems,

    A. Marathe, P. Jain, R. Walambe, and K. Kotecha, “Restorex-ai: A contrastive approach towards guiding image restoration via explainable ai systems,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 3030–3039

  54. [54]

    Blind focusing for computational microwave imaging with metasurface aperture based on sparse bayesian learning,

    H. Fu, Y . Wang, F. Dai, and L. Hong, “Blind focusing for computational microwave imaging with metasurface aperture based on sparse bayesian learning,”IEEE Trans. Geosci. Remote Sens., 2024

  55. [55]

    Connecting image denoising and high-level vision tasks via deep learning,

    D. Liu, B. Wen, J. Jiao, X. Liu, Z. Wang, and T. S. Huang, “Connecting image denoising and high-level vision tasks via deep learning,”IEEE Trans. Image Process., vol. 29, pp. 3695–3706, 2020

  56. [56]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779–788

  57. [57]

    YOLOX: Exceeding YOLO Series in 2021

    Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,”arXiv preprint arXiv:2107.08430, 2021

  58. [58]

    Yolov10: Real-time end-to-end object detection,

    A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Hanet al., “Yolov10: Real-time end-to-end object detection,”Adv. Neural Inf. Process. Syst., vol. 37, pp. 107 984–108 011, 2024

  59. [59]

    Target- aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,

    J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target- aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 5802–5811

  60. [60]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 618–626

  61. [61]

    Image-adaptive yolo for object detection in adverse weather conditions,

    W. Liu, G. Ren, R. Yu, S. Guo, J. Zhu, and L. Zhang, “Image-adaptive yolo for object detection in adverse weather conditions,” inProceedings of the AAAI conference on artificial intelligence, vol. 36, no. 2, 2022, pp. 1792–1800

  62. [62]

    Esod: Efficient small object detection on high-resolution images,

    K. Liu, Z. Fu, S. Jin, Z. Chen, F. Zhou, R. Jiang, Y . Chen, and J. Ye, “Esod: Efficient small object detection on high-resolution images,”IEEE Transactions on Image Processing, vol. 34, pp. 183–195, 2025