pith. sign in

arxiv: 2604.09991 · v1 · submitted 2026-04-11 · 💻 cs.CV

Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection

Pith reviewed 2026-05-10 16:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords infrared small target detectionscale loss functionGaussian convolutionmonotonic gradientsspatial attentionrotated pinwheel masktarget detection
0
0 comments X

The pith

A diff-based scale loss and Gaussian-shaped convolution improve infrared small target detection by stabilizing training and matching target profiles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix unstable training from scale losses that lack monotonic gradients and weak spatial focus from standard convolutions that ignore how small targets concentrate intensity in infrared images. It replaces the loss with one based on signed area differences between predicted and true masks to guarantee consistent gradient directions, and replaces kernels with Gaussian-shaped ones that learn a scale parameter while using a rotated pinwheel mask to adapt to target direction. A reader would care because reliable detection of tiny infrared signals matters for surveillance, tracking, and warning systems. The authors test the combination on three public datasets and report gains in overlap and detection rates over prior work.

Core claim

The authors claim that weighting predictions by the signed area difference between the predicted mask and ground truth produces strictly monotonic gradients for stable convergence, unlike earlier scale losses. They further claim that Gaussian-shaped convolution with a learnable scale parameter, combined with a rotated pinwheel mask aligned through a straight-through estimator, better captures the center-concentrated intensity profile of infrared small targets than generic kernels, yielding higher mIoU, Pd, and lower Fa on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB.

What carries the argument

The diff-based scale loss, which weights by signed area difference to enforce monotonic gradients, together with Gaussian-shaped convolution that uses a learnable scale and a rotated pinwheel mask for orientation alignment.

Load-bearing premise

The signed area difference between any predicted and ground-truth mask always produces strictly monotonic gradients, and the intensity distribution of infrared small targets is adequately captured by a center-concentrated Gaussian profile.

What would settle it

A training run on any of the three datasets in which the proposed loss produces non-monotonic gradients for some mask configurations, or in which the Gaussian kernel method shows no improvement over baselines on targets whose intensity profiles deviate from a Gaussian shape.

Figures

Figures reproduced from arXiv: 2604.09991 by Hao Li, Man Fung Zhuo.

Figure 1
Figure 1. Figure 1: Comparison between our monotonic diff-based scale loss and conventional non￾monotonic scale losses. The proposed diff-based loss exhibits strict monotonicity with respect to scale deviation, stably penalizing mismatches between predicted and ground￾truth target areas and naturally guiding optimization toward the optimal center, thus eliminating unstable gradients and ensuring stable convergence during trai… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the Gaussian-like intensity distribution inherent to IRSTs. As shown, small targets exhibit a distinct center-concentrated, smoothly decaying grayscale pattern, which motivates the design of Gaussian-shaped spatial attention or convolution to align with this natural imaging characteristic, rather than using generic fully learned receptive fields. convergence to the correct target scale (as … view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the directional morphological diversity of IRSTs. As shown, IRSTs exhibit varied directional structures (horizontal, vertical, or oblique point￾like/elongated shapes), which motivates the use of learnable rotated pinwheel masks to adaptively align spatial attention with target orientation, complementing Gaussian￾shaped convolution to better match the diverse directional properties of small … view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the proposed framework. The U-Net encoder–decoder backbone applies channel attention and Gaussian-shaped spatial attention within each residual block. The 7 × 7 spatial attention kernel is constructed by combining a Gaussian prior (learnable σ) with a learnable rotated pinwheel mask whose orientation θ is optimized via a straight-through estimator, as illustrated in the upper branch. The final … view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the four scale weighting functions. Each row corresponds to one variant (Diff-based, Var-based, Mobius, Var-denominator). Left: contour map over the (Ap, At) plane. Middle: 3D surface. Right: anti-diagonal cross-section with Ap+At = 10, showing the weight value as Ap varies. Only the Diff-based weight decays strictly and monotonically away from Ap = At, while the Var-based weight is non-mo… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative segmentation comparison. Each row shows one test scene. Zoomed insets highlight the target region. Green pixels indicate true positives, red pixels indi￾cate false positives, and yellow pixels indicate false negatives. L1-GP-Rotated consis￾tently produces clean segmentation masks with fewer false alarms and missed detections compared to competing methods. 4.3 Plug-and-play Comparisons To assess… view at source ↗
Figure 7
Figure 7. Figure 7: 3D prediction heatmap comparison. Each bar represents a detected blob: green bars are true detections at the correct location, red bars are false alarms. L1-GP￾Rotated yields a single clean green bar aligned with the ground-truth target, demon￾strating significantly better false alarm suppression. 4.4 High False Alarm Source We examine whether the location regularizer Lloc is the structural cause of ele￾va… view at source ↗
read the original abstract

Infrared small target detection still faces two persistent challenges: training instability from non-monotonic scale loss functions, and inadequate spatial attention due to generic convolution kernels that ignore the physical imaging characteristics of small targets. In this paper, we revisit both aspects. For the loss side, we propose a \emph{diff-based scale loss} that weights predictions according to the signed area difference between the predicted mask and the ground truth, yielding strictly monotonic gradients and stable convergence. We further analyze a family of four scale loss variants to understand how their geometric properties affect detection behavior. For the spatial side, we introduce \emph{Gaussian-shaped convolution} with a learnable scale parameter to match the center-concentrated intensity profile of infrared small targets, and augment it with a \emph{rotated pinwheel mask} that adaptively aligns the kernel with target orientation via a straight-through estimator. Extensive experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB demonstrate consistent improvements in $mIoU$, $P_d$, and $F_a$ over state-of-the-art methods. We release our anonymous code and pretrained models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a diff-based scale loss using signed area difference between predicted and ground-truth masks produces strictly monotonic gradients for stable training in infrared small target detection; it geometrically analyzes four scale-loss variants, introduces Gaussian-shaped convolution with a learnable scale parameter plus a rotated pinwheel mask aligned via straight-through estimator to better match target intensity profiles, and reports consistent gains in mIoU, Pd, and Fa over SOTA on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB.

Significance. If the monotonicity property and Gaussian-profile assumption hold, the approach could stabilize training and improve spatial attention for small-target tasks in surveillance and remote sensing; the multi-dataset evaluation and release of code/pretrained models are positive for reproducibility and allow direct comparison.

major comments (2)
  1. [diff-based scale loss and variant analysis] The central claim that signed-area-difference weighting yields strictly monotonic gradients for any mask configuration (abstract and loss-function section) is load-bearing for the stability and performance assertions, yet the geometric analysis of the four variants supplies no exhaustive enumeration, counter-example search, or discrete-grid verification; partial overlaps, boundary pixels, or non-convex predictions can produce non-monotonic loss values or gradient reversals even as IoU improves.
  2. [experimental results] Table or ablation results (experimental section) do not isolate the incremental contribution of the diff-based loss versus the Gaussian convolution and pinwheel mask; without such breakdowns it is difficult to confirm that the reported gains on the three datasets are attributable to the proposed mechanisms rather than dataset-specific tuning or baseline differences.
minor comments (2)
  1. [Gaussian-shaped convolution] The precise formulation of the rotated pinwheel mask and its straight-through estimator integration would benefit from an explicit equation or algorithm box to aid implementation.
  2. [method overview] Notation for the learnable scale parameter and signed area difference should be introduced with a single consistent symbol set rather than varying across text and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the manuscript.

read point-by-point responses
  1. Referee: [diff-based scale loss and variant analysis] The central claim that signed-area-difference weighting yields strictly monotonic gradients for any mask configuration (abstract and loss-function section) is load-bearing for the stability and performance assertions, yet the geometric analysis of the four variants supplies no exhaustive enumeration, counter-example search, or discrete-grid verification; partial overlaps, boundary pixels, or non-convex predictions can produce non-monotonic loss values or gradient reversals even as IoU improves.

    Authors: We appreciate the referee's careful scrutiny of the monotonicity claim, which is indeed central to our contribution. The geometric analysis in the loss-function section shows that the signed area difference produces monotonic gradients with respect to the scale parameter for the considered variants. To rigorously address potential issues in discrete settings and complex overlaps, we will augment the analysis with discrete-grid verifications, exhaustive checks on small mask configurations, and a search for counter-examples in the revised manuscript. This will either confirm the property or allow us to qualify the claim appropriately. revision: partial

  2. Referee: [experimental results] Table or ablation results (experimental section) do not isolate the incremental contribution of the diff-based loss versus the Gaussian convolution and pinwheel mask; without such breakdowns it is difficult to confirm that the reported gains on the three datasets are attributable to the proposed mechanisms rather than dataset-specific tuning or baseline differences.

    Authors: We agree that additional ablation studies would better isolate the contributions of each component. In the revised manuscript, we will include new tables and experiments that ablate the diff-based scale loss, the Gaussian-shaped convolution, and the rotated pinwheel mask individually across the three benchmarks. This will provide clear evidence of their incremental impacts on mIoU, Pd, and Fa. revision: yes

Circularity Check

0 steps flagged

No circularity: proposals are new modules validated empirically, not reductions to inputs by construction

full rationale

The paper proposes a diff-based scale loss (weighted by signed area difference) and Gaussian-shaped convolution (with learnable scale and rotated pinwheel mask via straight-through estimator) as solutions to stated challenges. These are motivated by geometric analysis and physical imaging assumptions rather than derived from prior equations or self-citations that presuppose the results. Performance claims rest on experiments across three datasets (IRSTD-1k, NUDT-SIRST, SIRST-UAVB) showing gains in mIoU, Pd, and Fa, with no load-bearing step where a 'prediction' or uniqueness theorem reduces tautologically to fitted parameters or author prior work. The monotonic gradient assertion follows directly from the loss definition without circular redefinition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The approach introduces two new algorithmic components whose justification rests on domain observations rather than derivation from first principles; the learnable scale is the only explicit free parameter.

free parameters (1)
  • learnable scale parameter
    Width of the Gaussian kernel is optimized during training rather than fixed a priori.
axioms (1)
  • domain assumption Infrared small targets exhibit a center-concentrated intensity profile that can be approximated by a Gaussian.
    Invoked to motivate the choice of kernel shape.
invented entities (2)
  • diff-based scale loss no independent evidence
    purpose: Provide strictly monotonic gradients by weighting according to signed area difference.
    Newly defined loss function.
  • rotated pinwheel mask no independent evidence
    purpose: Adaptively align the convolution kernel with target orientation.
    New masking mechanism using straight-through estimator.

pith-pipeline@v0.9.0 · 5500 in / 1440 out tokens · 66745 ms · 2026-05-10T16:24:06.111596+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Infrared Physics & Technology101, 78–87 (2019)

    Aghaziyarati, S., Moradi, S., Talebi, H.: Small infrared target detection using ab- solute average difference weighted by cumulative directional derivatives. Infrared Physics & Technology101, 78–87 (2019)

  2. [2]

    Journal of infrared, millimeter, and terahertz waves31(6), 735–743 (2010)

    Bae, T.W., Sohng, K.I.: Small target detection using bilateral filter based on edge component. Journal of infrared, millimeter, and terahertz waves31(6), 735–743 (2010)

  3. [3]

    Optics & Laser Technology43(7), 1084–1090 (2011)

    Bai, X., Zhou, F.: Hit-or-miss transform based infrared dim small target enhance- ment. Optics & Laser Technology43(7), 1084–1090 (2011)

  4. [4]

    IEEE transactions on geoscience and remote sensing 52(1), 574–581 (2013)

    Chen, C.P., Li, H., Wei, Y., Xia, T., Tang, Y.Y.: A local contrast method for small infrared target detection. IEEE transactions on geoscience and remote sensing 52(1), 574–581 (2013)

  5. [5]

    In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

    Dai, Y., Wu, Y., Zhou, F., Barnard, K.: Asymmetric contextual modulation for in- frared small target detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 950–959 (2021)

  6. [6]

    IEEE transactions on geoscience and remote sensing 59(11), 9813–9824 (2021)

    Dai, Y., Wu, Y., Zhou, F., Barnard, K.: Attentional local contrast networks for in- frared small target detection. IEEE transactions on geoscience and remote sensing 59(11), 9813–9824 (2021)

  7. [7]

    Multimedia Tools and Appli- cations77(9), 10539–10551 (2018)

    Deng, L., Zhu, H., Zhou, Q., Li, Y.: Adaptive top-hat filter based on quantum genetic algorithm for infrared small target detection. Multimedia Tools and Appli- cations77(9), 10539–10551 (2018)

  8. [8]

    Advances in neural information processing systems34, 20230–20242 (2021)

    He, J., Erfani, S., Ma, X., Bailey, J., Chi, Y., Hua, X.S.:α-iou: A family of power intersection over union losses for bounding box regression. Advances in neural information processing systems34, 20230–20242 (2021)

  9. [9]

    Pattern recognition 143, 109788 (2023)

    Kou, R., Wang, C., Peng, Z., Zhao, Z., Chen, Y., Han, J., Huang, F., Yu, Y., Fu, Q.: Infrared small target segmentation networks: A survey. Pattern recognition 143, 109788 (2023)

  10. [10]

    IEEE Transactions on Image Processing32, 1745–1758 (2022)

    Li, B., Xiao, C., Wang, L., Wang, Y., Lin, Z., Li, M., An, W., Guo, Y.: Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing32, 1745–1758 (2022)

  11. [11]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liu, Q., Liu, R., Zheng, B., Wang, H., Fu, Y.: Infrared small target detection with scale and location sensitivity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17490–17499 (2024)

  12. [12]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: General- ized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 658–666 (2019)

  13. [13]

    Remote Sensing13(16), 3200 (2021)

    Tong, X., Sun, B., Wei, J., Zuo, Z., Su, S.: Eaau-net: Enhanced asymmetric at- tention u-net for infrared small target detection. Remote Sensing13(16), 3200 (2021)

  14. [14]

    false alarm: Adversarial learning for small object segmentation in infrared images

    Wang, H., Zhou, L., Wang, L.: Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8509–8518 (2019)

  15. [15]

    IEEE Transactions on Geoscience and Remote Sensing60, 1–13 (2022)

    Wang, K., Du, S., Liu, C., Cao, Z.: Interior attention-aware network for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing60, 1–13 (2022)

  16. [16]

    IEEE Transactions on Geoscience and Remote Sensing61, 1–15 (2023) 18 Hao Li and Man Fung Zhuo

    Wu, T., Li, B., Luo, Y., Wang, Y., Xiao, C., Liu, T., Yang, J., An, W., Guo, Y.: Mtu-net: Multilevel transunet for space-based infrared tiny ship detection. IEEE Transactions on Geoscience and Remote Sensing61, 1–15 (2023) 18 Hao Li and Man Fung Zhuo

  17. [17]

    IEEE Transactions on Image Processing32, 364–376 (2022)

    Wu, X., Hong, D., Chanussot, J.: Uiu-net: U-net in u-net for infrared small object detection. IEEE Transactions on Image Processing32, 364–376 (2022)

  18. [18]

    IEEE Sensors Journal (2025)

    Xu, Y., Liu, P., Qian, W., Zhang, J., Kong, X., Wan, M.: Small and dim target detection under strong clutter based on similarly of gaussian and motion outlier significance using moving infrared camera. IEEE Sensors Journal (2025)

  19. [19]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Yang, J., Liu, S., Wu, J., Su, X., Hai, N., Huang, X.: Pinwheel-shaped convolution and scale-based dynamic loss for infrared small target detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 9202–9210 (2025)

  20. [20]

    IEEE Transactions on Geoscience and Remote Sensing62, 1–15 (2024)

    Yuan, S., Qin, H., Yan, X., Akhtar, N., Mian, A.: Sctransnet: Spatial-channel cross transformer network for infrared small target detection. IEEE Transactions on Geoscience and Remote Sensing62, 1–15 (2024)

  21. [21]

    Infrared Physics & Technology107, 103290 (2020)

    Zhang, H., Zhou, Z.: Small target detection based on automatic roi extraction and local directional gray&entropy contrast map. Infrared Physics & Technology107, 103290 (2020)

  22. [22]

    Remote Sensing 10(11), 1821 (2018)

    Zhang, L., Peng, L., Zhang, T., Cao, S., Peng, Z.: Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm. Remote Sensing 10(11), 1821 (2018)

  23. [23]

    In: Proceedings of the 30th ACM International Conference on Multimedia

    Zhang, M., Yue, K., Zhang, J., Li, Y., Gao, X.: Exploring feature compensation and cross-level correlation for infrared small target detection. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 1857–1865 (2022)

  24. [24]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zhang, M., Zhang, R., Yang, Y., Bai, H., Zhang, J., Guo, J.: Isnet: Shape matters for infrared small target detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 877–886 (2022)

  25. [25]

    Remote Sensing 11(5), 559 (2019)

    Zhang, T., Wu, H., Liu, Y., Peng, L., Yang, C., Peng, Z.: Infrared small target de- tection based on non-convex optimization with lp-norm constraint. Remote Sensing 11(5), 559 (2019)

  26. [26]

    In: 2nd International Con- ference on Computer Engineering, Information Science & Application Technology (ICCIA 2017)

    Zhang, X., Chi, J., Hu, J., Liu, L., Xing, Y.: Infrared small target detection using modified order morphology and weighted local entropy. In: 2nd International Con- ference on Computer Engineering, Information Science & Application Technology (ICCIA 2017). pp. 356–365. Atlantis Press (2016)

  27. [27]

    In: 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT)

    Zhang, Y., Li, Z.: A gaussian weighted multi-scale method for infrared small tar- get detection. In: 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT). pp. 465–469. IEEE (2025)

  28. [28]

    IEEE Transactions on Geoscience and Remote Sensing63, 1–15 (2025)

    Zhang, Y., Li, Z., Siddique, A., Azeem, A., Chen, W., Cao, D.: Infrared small target detection based on interpretation weighted sparse method. IEEE Transactions on Geoscience and Remote Sensing63, 1–15 (2025)

  29. [29]

    IEEE geoscience and remote sensing magazine10(2), 87–119 (2022)

    Zhao, M., Li, W., Li, L., Hu, J., Ma, P., Tao, R.: Single-frame infrared small-target detection: A survey. IEEE geoscience and remote sensing magazine10(2), 87–119 (2022)

  30. [30]

    arXiv preprint arXiv:2001.05852 (2019)

    Zhao, M., Cheng, L., Yang, X., Feng, P., Liu, L., Wu, N.: Tbc-net: A real-time de- tector for infrared small target detection using semantic constraint. arXiv preprint arXiv:2001.05852 (2019)

  31. [31]

    In: Proceedings of the AAAI conference on artificial intelligence

    Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 12993–13000 (2020)

  32. [32]

    IEEE Transactions on Image Processing29, 9546–9558 (2020)

    Zhu, H., Ni, H., Liu, S., Xu, G., Deng, L.: Tnlrs: Target-aware non-local low-rank modeling with saliency filtering regularization for infrared small target detection. IEEE Transactions on Image Processing29, 9546–9558 (2020)