Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection
Pith reviewed 2026-05-10 16:24 UTC · model grok-4.3
The pith
A diff-based scale loss and Gaussian-shaped convolution improve infrared small target detection by stabilizing training and matching target profiles.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that weighting predictions by the signed area difference between the predicted mask and ground truth produces strictly monotonic gradients for stable convergence, unlike earlier scale losses. They further claim that Gaussian-shaped convolution with a learnable scale parameter, combined with a rotated pinwheel mask aligned through a straight-through estimator, better captures the center-concentrated intensity profile of infrared small targets than generic kernels, yielding higher mIoU, Pd, and lower Fa on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB.
What carries the argument
The diff-based scale loss, which weights by signed area difference to enforce monotonic gradients, together with Gaussian-shaped convolution that uses a learnable scale and a rotated pinwheel mask for orientation alignment.
Load-bearing premise
The signed area difference between any predicted and ground-truth mask always produces strictly monotonic gradients, and the intensity distribution of infrared small targets is adequately captured by a center-concentrated Gaussian profile.
What would settle it
A training run on any of the three datasets in which the proposed loss produces non-monotonic gradients for some mask configurations, or in which the Gaussian kernel method shows no improvement over baselines on targets whose intensity profiles deviate from a Gaussian shape.
Figures
read the original abstract
Infrared small target detection still faces two persistent challenges: training instability from non-monotonic scale loss functions, and inadequate spatial attention due to generic convolution kernels that ignore the physical imaging characteristics of small targets. In this paper, we revisit both aspects. For the loss side, we propose a \emph{diff-based scale loss} that weights predictions according to the signed area difference between the predicted mask and the ground truth, yielding strictly monotonic gradients and stable convergence. We further analyze a family of four scale loss variants to understand how their geometric properties affect detection behavior. For the spatial side, we introduce \emph{Gaussian-shaped convolution} with a learnable scale parameter to match the center-concentrated intensity profile of infrared small targets, and augment it with a \emph{rotated pinwheel mask} that adaptively aligns the kernel with target orientation via a straight-through estimator. Extensive experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB demonstrate consistent improvements in $mIoU$, $P_d$, and $F_a$ over state-of-the-art methods. We release our anonymous code and pretrained models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a diff-based scale loss using signed area difference between predicted and ground-truth masks produces strictly monotonic gradients for stable training in infrared small target detection; it geometrically analyzes four scale-loss variants, introduces Gaussian-shaped convolution with a learnable scale parameter plus a rotated pinwheel mask aligned via straight-through estimator to better match target intensity profiles, and reports consistent gains in mIoU, Pd, and Fa over SOTA on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB.
Significance. If the monotonicity property and Gaussian-profile assumption hold, the approach could stabilize training and improve spatial attention for small-target tasks in surveillance and remote sensing; the multi-dataset evaluation and release of code/pretrained models are positive for reproducibility and allow direct comparison.
major comments (2)
- [diff-based scale loss and variant analysis] The central claim that signed-area-difference weighting yields strictly monotonic gradients for any mask configuration (abstract and loss-function section) is load-bearing for the stability and performance assertions, yet the geometric analysis of the four variants supplies no exhaustive enumeration, counter-example search, or discrete-grid verification; partial overlaps, boundary pixels, or non-convex predictions can produce non-monotonic loss values or gradient reversals even as IoU improves.
- [experimental results] Table or ablation results (experimental section) do not isolate the incremental contribution of the diff-based loss versus the Gaussian convolution and pinwheel mask; without such breakdowns it is difficult to confirm that the reported gains on the three datasets are attributable to the proposed mechanisms rather than dataset-specific tuning or baseline differences.
minor comments (2)
- [Gaussian-shaped convolution] The precise formulation of the rotated pinwheel mask and its straight-through estimator integration would benefit from an explicit equation or algorithm box to aid implementation.
- [method overview] Notation for the learnable scale parameter and signed area difference should be introduced with a single consistent symbol set rather than varying across text and figures.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the manuscript.
read point-by-point responses
-
Referee: [diff-based scale loss and variant analysis] The central claim that signed-area-difference weighting yields strictly monotonic gradients for any mask configuration (abstract and loss-function section) is load-bearing for the stability and performance assertions, yet the geometric analysis of the four variants supplies no exhaustive enumeration, counter-example search, or discrete-grid verification; partial overlaps, boundary pixels, or non-convex predictions can produce non-monotonic loss values or gradient reversals even as IoU improves.
Authors: We appreciate the referee's careful scrutiny of the monotonicity claim, which is indeed central to our contribution. The geometric analysis in the loss-function section shows that the signed area difference produces monotonic gradients with respect to the scale parameter for the considered variants. To rigorously address potential issues in discrete settings and complex overlaps, we will augment the analysis with discrete-grid verifications, exhaustive checks on small mask configurations, and a search for counter-examples in the revised manuscript. This will either confirm the property or allow us to qualify the claim appropriately. revision: partial
-
Referee: [experimental results] Table or ablation results (experimental section) do not isolate the incremental contribution of the diff-based loss versus the Gaussian convolution and pinwheel mask; without such breakdowns it is difficult to confirm that the reported gains on the three datasets are attributable to the proposed mechanisms rather than dataset-specific tuning or baseline differences.
Authors: We agree that additional ablation studies would better isolate the contributions of each component. In the revised manuscript, we will include new tables and experiments that ablate the diff-based scale loss, the Gaussian-shaped convolution, and the rotated pinwheel mask individually across the three benchmarks. This will provide clear evidence of their incremental impacts on mIoU, Pd, and Fa. revision: yes
Circularity Check
No circularity: proposals are new modules validated empirically, not reductions to inputs by construction
full rationale
The paper proposes a diff-based scale loss (weighted by signed area difference) and Gaussian-shaped convolution (with learnable scale and rotated pinwheel mask via straight-through estimator) as solutions to stated challenges. These are motivated by geometric analysis and physical imaging assumptions rather than derived from prior equations or self-citations that presuppose the results. Performance claims rest on experiments across three datasets (IRSTD-1k, NUDT-SIRST, SIRST-UAVB) showing gains in mIoU, Pd, and Fa, with no load-bearing step where a 'prediction' or uniqueness theorem reduces tautologically to fitted parameters or author prior work. The monotonic gradient assertion follows directly from the loss definition without circular redefinition.
Axiom & Free-Parameter Ledger
free parameters (1)
- learnable scale parameter
axioms (1)
- domain assumption Infrared small targets exhibit a center-concentrated intensity profile that can be approximated by a Gaussian.
invented entities (2)
-
diff-based scale loss
no independent evidence
-
rotated pinwheel mask
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Infrared Physics & Technology101, 78–87 (2019)
Aghaziyarati, S., Moradi, S., Talebi, H.: Small infrared target detection using ab- solute average difference weighted by cumulative directional derivatives. Infrared Physics & Technology101, 78–87 (2019)
work page 2019
-
[2]
Journal of infrared, millimeter, and terahertz waves31(6), 735–743 (2010)
Bae, T.W., Sohng, K.I.: Small target detection using bilateral filter based on edge component. Journal of infrared, millimeter, and terahertz waves31(6), 735–743 (2010)
work page 2010
-
[3]
Optics & Laser Technology43(7), 1084–1090 (2011)
Bai, X., Zhou, F.: Hit-or-miss transform based infrared dim small target enhance- ment. Optics & Laser Technology43(7), 1084–1090 (2011)
work page 2011
-
[4]
IEEE transactions on geoscience and remote sensing 52(1), 574–581 (2013)
Chen, C.P., Li, H., Wei, Y., Xia, T., Tang, Y.Y.: A local contrast method for small infrared target detection. IEEE transactions on geoscience and remote sensing 52(1), 574–581 (2013)
work page 2013
-
[5]
In: Proceedings of the IEEE/CVF winter conference on applications of computer vision
Dai, Y., Wu, Y., Zhou, F., Barnard, K.: Asymmetric contextual modulation for in- frared small target detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 950–959 (2021)
work page 2021
-
[6]
IEEE transactions on geoscience and remote sensing 59(11), 9813–9824 (2021)
Dai, Y., Wu, Y., Zhou, F., Barnard, K.: Attentional local contrast networks for in- frared small target detection. IEEE transactions on geoscience and remote sensing 59(11), 9813–9824 (2021)
work page 2021
-
[7]
Multimedia Tools and Appli- cations77(9), 10539–10551 (2018)
Deng, L., Zhu, H., Zhou, Q., Li, Y.: Adaptive top-hat filter based on quantum genetic algorithm for infrared small target detection. Multimedia Tools and Appli- cations77(9), 10539–10551 (2018)
work page 2018
-
[8]
Advances in neural information processing systems34, 20230–20242 (2021)
He, J., Erfani, S., Ma, X., Bailey, J., Chi, Y., Hua, X.S.:α-iou: A family of power intersection over union losses for bounding box regression. Advances in neural information processing systems34, 20230–20242 (2021)
work page 2021
-
[9]
Pattern recognition 143, 109788 (2023)
Kou, R., Wang, C., Peng, Z., Zhao, Z., Chen, Y., Han, J., Huang, F., Yu, Y., Fu, Q.: Infrared small target segmentation networks: A survey. Pattern recognition 143, 109788 (2023)
work page 2023
-
[10]
IEEE Transactions on Image Processing32, 1745–1758 (2022)
Li, B., Xiao, C., Wang, L., Wang, Y., Lin, Z., Li, M., An, W., Guo, Y.: Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing32, 1745–1758 (2022)
work page 2022
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Liu, Q., Liu, R., Zheng, B., Wang, H., Fu, Y.: Infrared small target detection with scale and location sensitivity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17490–17499 (2024)
work page 2024
-
[12]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: General- ized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 658–666 (2019)
work page 2019
-
[13]
Remote Sensing13(16), 3200 (2021)
Tong, X., Sun, B., Wei, J., Zuo, Z., Su, S.: Eaau-net: Enhanced asymmetric at- tention u-net for infrared small target detection. Remote Sensing13(16), 3200 (2021)
work page 2021
-
[14]
false alarm: Adversarial learning for small object segmentation in infrared images
Wang, H., Zhou, L., Wang, L.: Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8509–8518 (2019)
work page 2019
-
[15]
IEEE Transactions on Geoscience and Remote Sensing60, 1–13 (2022)
Wang, K., Du, S., Liu, C., Cao, Z.: Interior attention-aware network for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing60, 1–13 (2022)
work page 2022
-
[16]
IEEE Transactions on Geoscience and Remote Sensing61, 1–15 (2023) 18 Hao Li and Man Fung Zhuo
Wu, T., Li, B., Luo, Y., Wang, Y., Xiao, C., Liu, T., Yang, J., An, W., Guo, Y.: Mtu-net: Multilevel transunet for space-based infrared tiny ship detection. IEEE Transactions on Geoscience and Remote Sensing61, 1–15 (2023) 18 Hao Li and Man Fung Zhuo
work page 2023
-
[17]
IEEE Transactions on Image Processing32, 364–376 (2022)
Wu, X., Hong, D., Chanussot, J.: Uiu-net: U-net in u-net for infrared small object detection. IEEE Transactions on Image Processing32, 364–376 (2022)
work page 2022
-
[18]
Xu, Y., Liu, P., Qian, W., Zhang, J., Kong, X., Wan, M.: Small and dim target detection under strong clutter based on similarly of gaussian and motion outlier significance using moving infrared camera. IEEE Sensors Journal (2025)
work page 2025
-
[19]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Yang, J., Liu, S., Wu, J., Su, X., Hai, N., Huang, X.: Pinwheel-shaped convolution and scale-based dynamic loss for infrared small target detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 9202–9210 (2025)
work page 2025
-
[20]
IEEE Transactions on Geoscience and Remote Sensing62, 1–15 (2024)
Yuan, S., Qin, H., Yan, X., Akhtar, N., Mian, A.: Sctransnet: Spatial-channel cross transformer network for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing62, 1–15 (2024)
work page 2024
-
[21]
Infrared Physics & Technology107, 103290 (2020)
Zhang, H., Zhou, Z.: Small target detection based on automatic roi extraction and local directional gray&entropy contrast map. Infrared Physics & Technology107, 103290 (2020)
work page 2020
-
[22]
Remote Sensing 10(11), 1821 (2018)
Zhang, L., Peng, L., Zhang, T., Cao, S., Peng, Z.: Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm. Remote Sensing 10(11), 1821 (2018)
work page 2018
-
[23]
In: Proceedings of the 30th ACM International Conference on Multimedia
Zhang, M., Yue, K., Zhang, J., Li, Y., Gao, X.: Exploring feature compensation and cross-level correlation for infrared small target detection. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 1857–1865 (2022)
work page 2022
-
[24]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhang, M., Zhang, R., Yang, Y., Bai, H., Zhang, J., Guo, J.: Isnet: Shape matters for infrared small target detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 877–886 (2022)
work page 2022
-
[25]
Remote Sensing 11(5), 559 (2019)
Zhang, T., Wu, H., Liu, Y., Peng, L., Yang, C., Peng, Z.: Infrared small target de- tection based on non-convex optimization with lp-norm constraint. Remote Sensing 11(5), 559 (2019)
work page 2019
-
[26]
Zhang, X., Chi, J., Hu, J., Liu, L., Xing, Y.: Infrared small target detection using modified order morphology and weighted local entropy. In: 2nd International Con- ference on Computer Engineering, Information Science & Application Technology (ICCIA 2017). pp. 356–365. Atlantis Press (2016)
work page 2017
-
[27]
Zhang, Y., Li, Z.: A gaussian weighted multi-scale method for infrared small tar- get detection. In: 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT). pp. 465–469. IEEE (2025)
work page 2025
-
[28]
IEEE Transactions on Geoscience and Remote Sensing63, 1–15 (2025)
Zhang, Y., Li, Z., Siddique, A., Azeem, A., Chen, W., Cao, D.: Infrared small target detection based on interpretation weighted sparse method. IEEE Transactions on Geoscience and Remote Sensing63, 1–15 (2025)
work page 2025
-
[29]
IEEE geoscience and remote sensing magazine10(2), 87–119 (2022)
Zhao, M., Li, W., Li, L., Hu, J., Ma, P., Tao, R.: Single-frame infrared small-target detection: A survey. IEEE geoscience and remote sensing magazine10(2), 87–119 (2022)
work page 2022
-
[30]
arXiv preprint arXiv:2001.05852 (2019)
Zhao, M., Cheng, L., Yang, X., Feng, P., Liu, L., Wu, N.: Tbc-net: A real-time de- tector for infrared small target detection using semantic constraint. arXiv preprint arXiv:2001.05852 (2019)
-
[31]
In: Proceedings of the AAAI conference on artificial intelligence
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 12993–13000 (2020)
work page 2020
-
[32]
IEEE Transactions on Image Processing29, 9546–9558 (2020)
Zhu, H., Ni, H., Liu, S., Xu, G., Deng, L.: Tnlrs: Target-aware non-local low-rank modeling with saliency filtering regularization for infrared small target detection. IEEE Transactions on Image Processing29, 9546–9558 (2020)
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.