pith. sign in

arxiv: 2601.14690 · v2 · submitted 2026-01-21 · 💻 cs.CV

FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection

Pith reviewed 2026-05-16 12:08 UTC · model grok-4.3

classification 💻 cs.CV
keywords infrared small target detectionmoving target detectionspatio-temporal feedbacksparse semantic moduleclosed-loop refinementmulti-frame detectioncomputer visionsemantic propagation
0
0 comments X

The pith

A closed-loop feedback network with forward and backward refinement modules improves moving infrared small target detection by exchanging semantics across sparse frames.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces FeedbackSTS-Det to address insufficient spatio-temporal correlation and high computation costs in detecting moving infrared small targets. The network uses a closed-loop strategy where paired forward and backward refinement modules cooperate across encoder and decoder layers to share information between consecutive frames. An embedded sparse semantic module groups frames at fixed intervals, propagates semantics inside each group, and reassembles the sequence to handle long-range dependencies efficiently. Experiments on standard multi-frame infrared datasets show gains in accuracy and fewer false alarms along with better scene adaptability. If correct, the method would support more reliable real-world systems for surveillance and warning tasks.

Core claim

Our approach introduces a closed-loop spatio-temporal semantic feedback strategy with paired forward and backward refinement modules that work cooperatively across the encoder and decoder to enhance information exchange between consecutive frames, effectively improving detection accuracy and reducing false alarms. Moreover, we introduce an embedded sparse semantic module (SSM), which operates by strategically grouping frames by interval, propagating semantics within each group, and reassembling the sequence to efficiently capture long-range temporal dependencies with low computational overhead.

What carries the argument

Closed-loop spatio-temporal semantic feedback strategy using paired forward and backward refinement modules that cooperate in the encoder and decoder, combined with the sparse semantic module that groups frames by interval for semantic propagation and sequence reassembly.

If this is right

  • Higher detection accuracy results from improved frame-to-frame semantic exchange.
  • Fewer false alarms occur because refinement modules reinforce consistent target signals.
  • Long-range temporal dependencies are captured at low overhead via interval grouping.
  • Generalization improves across different scenes on standard multi-frame datasets.
  • Practical systems for missile warning and maritime surveillance become more reliable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The interval-grouping trick could be reused in other video detection pipelines that must handle long sequences without exploding memory use.
  • Closed-loop refinement might reduce error buildup in any sequential vision task where early-frame mistakes propagate.
  • The architecture could be paired with visible-spectrum inputs to create cross-modal detectors that inherit the same feedback efficiency.
  • Real-time streaming versions would need only minor changes to the reassembly step to process live camera feeds.

Load-bearing premise

The closed-loop feedback and interval-based sparse grouping will consistently capture relevant long-range temporal dependencies across varied real-world infrared scenes without introducing new errors.

What would settle it

A test on a fresh infrared dataset with novel motion patterns or heavy background clutter that shows no statistically significant improvement in precision or false-alarm rate over non-feedback baselines would falsify the claim.

Figures

Figures reproduced from arXiv: 2601.14690 by Aji Mao, Liang Xu, Qing Qin, Xiangyu Qiu, Xian Zhang, Yian Huang, Zhenming Peng.

Figure 1
Figure 1. Figure 1: Overall procedure of FeedbackSTS-Det model for ISTD. (a) presents the overall framework. (b) denotes forward spatio-temporal semantic refinement [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Basic feedback module (BFBM). (a) presents the framework of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ROC curves on two benchmark datasets. (a) ROC Curve on NUDT [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results of different methods. For better visual presentation, the target regions are highlighted with red boxes and then displayed as zoomed-in [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of different feedback design variants against the Full-FB. Evaluation metrics include mean Intersection over [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual comparison of Part-FB1 versus Full-FB using feature maps [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of Enc-NoFB versus Full-FB using feature maps [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison of All-Fwd versus Full-FB using feature maps [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparison of All-Bwd versus Full-FB using feature maps [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visual results of the FSTSRM2 module outputs for frame [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visual results of the BSTSRM1 module outputs for frame [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visual results of the BSTSRM3 module outputs for frame [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗
read the original abstract

Infrared small target detection (ISTD) has been a critical technology in defense and civilian applications over the past several decades, such as missile warning, maritime surveillance, and disaster monitoring. Nevertheless, moving infrared small target detection still faces considerable challenges: existing models suffer from insufficient spatio-temporal semantic correlation and are not lightweight-friendly, while algorithms with strong scene generalization capability are in great demand for real-world applications. To address these issues, we propose FeedbackSTS-Det, a sparse frames-based spatio-temporal semantic feedback network. Our approach introduces a closed-loop spatio-temporal semantic feedback strategy with paired forward and backward refinement modules that work cooperatively across the encoder and decoder to enhance information exchange between consecutive frames, effectively improving detection accuracy and reducing false alarms. Moreover, we introduce an embedded sparse semantic module (SSM), which operates by strategically grouping frames by interval, propagating semantics within each group, and reassembling the sequence to efficiently capture long-range temporal dependencies with low computational overhead. Extensive experiments on many widely adopted multi-frame infrared small target datasets demonstrate the generalization ability and scene adaptability of our proposed network. Code and models are available at: https://github.com/IDIP-Lab/FeedbackSTS-Det.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces FeedbackSTS-Det, a sparse frames-based spatio-temporal semantic feedback network for moving infrared small target detection. It proposes a closed-loop feedback strategy using paired forward and backward refinement modules that cooperate across encoder and decoder to improve inter-frame information exchange, together with an embedded sparse semantic module (SSM) that groups frames by interval, propagates semantics within groups, and reassembles the sequence to capture long-range temporal dependencies at low cost. Experiments on widely used multi-frame infrared datasets are reported to show gains in detection accuracy and reductions in false alarms, with code released.

Significance. If the performance claims hold after proper isolation of components, the work would offer a lightweight architecture that addresses insufficient spatio-temporal correlation in existing ISTD models while remaining suitable for real-world generalization in defense and surveillance applications. The combination of closed-loop feedback and sparse grouping is a concrete attempt to balance accuracy and efficiency.

major comments (1)
  1. [Ablation studies (typically §4.3 or equivalent)] The central claim that the paired forward/backward refinement modules operating cooperatively in a closed loop are responsible for the reported accuracy gains and false-alarm reductions is load-bearing. An ablation that freezes or removes the backward pass while retaining the forward path, SSM, and base encoder-decoder is required to isolate its contribution from the spatio-temporal backbone or sparse grouping alone. Without this, improvements on standard datasets could be explained by other elements of the architecture.
minor comments (2)
  1. [Abstract] The abstract asserts performance gains but supplies no numerical metrics, dataset names, or error bars; a concise quantitative summary should be added for immediate readability.
  2. [Method (SSM description)] Notation for the SSM grouping interval and reassembly operation should be defined explicitly with a diagram or pseudocode to clarify how semantics are propagated within each group.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the specific suggestion regarding ablation studies. We agree that isolating the contribution of the backward refinement module within the closed-loop feedback is important for substantiating the central claims. We will add the requested experiment to the revised manuscript.

read point-by-point responses
  1. Referee: [Ablation studies (typically §4.3 or equivalent)] The central claim that the paired forward/backward refinement modules operating cooperatively in a closed loop are responsible for the reported accuracy gains and false-alarm reductions is load-bearing. An ablation that freezes or removes the backward pass while retaining the forward path, SSM, and base encoder-decoder is required to isolate its contribution from the spatio-temporal backbone or sparse grouping alone. Without this, improvements on standard datasets could be explained by other elements of the architecture.

    Authors: We agree that a targeted ablation isolating the backward pass is necessary to strengthen the evidence for the closed-loop mechanism. The current manuscript reports ablations on the SSM and the overall feedback strategy but does not include the precise variant requested (backward pass disabled while retaining forward path, SSM, and base encoder-decoder). In the revision we will add this experiment to §4.3, comparing the full model against the forward-only variant on the same datasets and metrics. This will directly quantify the incremental benefit of the cooperative forward-backward interaction beyond the spatio-temporal backbone and sparse grouping. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with no self-referential derivations

full rationale

The paper presents FeedbackSTS-Det as a novel encoder-decoder network with forward/backward refinement modules and an embedded sparse semantic module (SSM). All performance claims rest on experimental results across standard multi-frame ISTD datasets rather than any closed-form derivation, fitted parameter renamed as prediction, or uniqueness theorem. No equations appear that define a quantity in terms of itself, no self-citations load-bear the central mechanism, and the SSM grouping/reassembly is described as an explicit design choice rather than smuggled via prior work. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions about dataset representativeness and the effectiveness of the newly introduced modules; no free parameters or invented physical entities are specified in the abstract.

axioms (1)
  • domain assumption Widely adopted multi-frame infrared small target datasets are representative of real-world scenes and sufficient to demonstrate generalization.
    The abstract invokes these datasets to support claims of scene adaptability.

pith-pipeline@v0.9.0 · 5529 in / 1249 out tokens · 52986 ms · 2026-05-16T12:08:56.991100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    Learning nonlocal quadrature contrast for detection and recognition of infrared rotary-wing uav targets in complex background,

    Y . Zhang, Y . Zhang, R. Fu, Z. Shi, J. Zhang, D. Liu, and J. Du, “Learning nonlocal quadrature contrast for detection and recognition of infrared rotary-wing uav targets in complex background,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–19, 2022

  2. [2]

    Local motion and contrast priors driven deep network for infrared small target superresolution,

    X. Ying, Y . Wang, L. Wang, W. Sheng, L. Liu, Z. Lin, and S. Zhou, “Local motion and contrast priors driven deep network for infrared small target superresolution,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 5480–5495, 2022

  3. [3]

    Classification of small boats in infrared images for maritime surveillance,

    M. Teutsch and W. Kr ¨uger, “Classification of small boats in infrared images for maritime surveillance,” in2010 international WaterSide security conference. IEEE, 2010, pp. 1–7

  4. [4]

    Smpisd-mtpnet: Scene semantic prior-assisted infrared ship detection using multi-task perception networks,

    C. Hu, X. Dong, Y . Huang, L. Wang, L. Xu, T. Pu, and Z. Peng, “Smpisd-mtpnet: Scene semantic prior-assisted infrared ship detection using multi-task perception networks,”IEEE Transactions on Geo- science and Remote Sensing, 2024

  5. [5]

    From visual comparison to robust satellite techniques: 30 years of thermal infrared satellite data analyses for the study of earthquake preparation phases,

    V . Tramutoli, R. Corrado, C. Filizzola, N. Genzano, M. Lisi, N. Pergola et al., “From visual comparison to robust satellite techniques: 30 years of thermal infrared satellite data analyses for the study of earthquake preparation phases,”Bollettino Di Geofisica Teorica e Applicata, vol. 56, no. 2, pp. 167–202, 2015

  6. [6]

    Morphology-based algorithm for point target detection in infrared backgrounds,

    V . T. Tom, T. Peli, M. Leung, and J. E. Bondaryk, “Morphology-based algorithm for point target detection in infrared backgrounds,” inSignal and Data Processing of Small Targets 1993, vol. 1954. SPIE, 1993, pp. 2–11

  7. [7]

    The two-dimensional adaptive lms (tdlms) algorithm,

    M. M. Hadhoud and D. W. Thomas, “The two-dimensional adaptive lms (tdlms) algorithm,”IEEE transactions on circuits and systems, vol. 35, no. 5, pp. 485–494, 1988

  8. [8]

    Infrared patch-image model for small target detection in a single image,

    C. Gao, D. Meng, Y . Yang, Y . Wang, X. Zhou, and A. G. Hauptmann, “Infrared patch-image model for small target detection in a single image,”IEEE transactions on image processing, vol. 22, no. 12, pp. 4996–5009, 2013. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 13

  9. [9]

    Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection,

    Y . Dai and Y . Wu, “Reweighted infrared patch-tensor model with both nonlocal and local priors for single-frame small target detection,”IEEE journal of selected topics in applied earth observations and remote sensing, vol. 10, no. 8, pp. 3752–3767, 2017

  10. [10]

    Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm,

    L. Zhang, L. Peng, T. Zhang, S. Cao, and Z. Peng, “Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm,”Remote Sensing, vol. 10, no. 11, p. 1821, 2018

  11. [11]

    Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model,

    Y . Sun, J. Yang, and W. An, “Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 5, pp. 3737–3752, 2020

  12. [12]

    Infrared small target detection using nonoverlapping patch spatial–temporal tensor factorization with capped nuclear norm regularization,

    G. Wang, B. Tao, X. Kong, and Z. Peng, “Infrared small target detection using nonoverlapping patch spatial–temporal tensor factorization with capped nuclear norm regularization,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2021

  13. [13]

    Infrared small target detection via nonconvex tensor tucker decomposition with factor prior,

    T. Liu, J. Yang, B. Li, Y . Wang, and W. An, “Infrared small target detection via nonconvex tensor tucker decomposition with factor prior,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 17, 2023

  14. [14]

    The temporal-spatial information fusion network for multi-frame infrared small target detection,

    T. Ma, H. Wang, J. Liang, Y . Wang, J. Peng, Z. Kai, and X. Liu, “The temporal-spatial information fusion network for multi-frame infrared small target detection,”IEEE Transactions on Instrumentation and Measurement, 2025

  15. [15]

    A temporal- semantic interaction network for multi-frame infrared small target de- tection,

    S. Zhuang, J. Peng, M. Qi, D. Wang, K. Li, and Y . Liu, “A temporal- semantic interaction network for multi-frame infrared small target de- tection,”Knowledge-Based Systems, p. 113840, 2025

  16. [16]

    Infrared small target detection in satellite videos: a new dataset and a novel recurrent feature refinement framework,

    X. Ying, L. Liu, Z. Lin, Y . Shi, Y . Wang, R. Li, X. Cao, B. Li, S. Zhou, and W. An, “Infrared small target detection in satellite videos: a new dataset and a novel recurrent feature refinement framework,”IEEE Transactions on Geoscience and Remote Sensing, 2025

  17. [17]

    Deformable feature alignment and refinement for moving infrared small target detection,

    D. Luo, Y . Xiang, H. Wang, L. Ji, S. Li, and M. Ye, “Deformable feature alignment and refinement for moving infrared small target detection,” Pattern Recognition, vol. 169, p. 111894, 2026

  18. [18]

    Low-level matters: An efficient hybrid architecture for robust multi- frame infrared small target detection,

    Z. Shen, S. Chen, H. Wang, T. Zhang, X. Zhang, X. Xu, and X. Yang, “Low-level matters: An efficient hybrid architecture for robust multi- frame infrared small target detection,”arXiv preprint arXiv:2503.02220, 2025

  19. [19]

    Mstcnet: Toward generalization improving for multi-frame infrared small target detection,

    R. Cui, N. Li, J. Liu, and H. Zhao, “Mstcnet: Toward generalization improving for multi-frame infrared small target detection,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

  20. [20]

    Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track,

    S. Kim and J. Lee, “Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track,”Pattern Recognition, vol. 45, no. 1, pp. 393–406, 2012

  21. [21]

    A local contrast method for small infrared target detection,

    C. P. Chen, H. Li, Y . Wei, T. Xia, and Y . Y . Tang, “A local contrast method for small infrared target detection,”IEEE transactions on geoscience and remote sensing, vol. 52, no. 1, pp. 574–581, 2013

  22. [22]

    Multiscale patch-based contrast measure for small infrared target detection,

    Y . Wei, X. You, and H. Li, “Multiscale patch-based contrast measure for small infrared target detection,”Pattern recognition, vol. 58, pp. 216– 226, 2016

  23. [23]

    Small infrared target detection based on low-rank and sparse representation,

    Y . He, M. Li, J. Zhang, and Q. An, “Small infrared target detection based on low-rank and sparse representation,”Infrared Physics & Technology, vol. 68, pp. 98–109, 2015

  24. [24]

    Infrared dim and small target detection based on stable multisubspace learning in heterogeneous scene,

    X. Wang, Z. Peng, D. Kong, and Y . He, “Infrared dim and small target detection based on stable multisubspace learning in heterogeneous scene,”IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 10, pp. 5481–5493, 2017

  25. [25]

    Infrared small target detection via self-regularized weighted sparse model,

    T. Zhang, Z. Peng, H. Wu, Y . He, C. Li, and C. Yang, “Infrared small target detection via self-regularized weighted sparse model,” Neurocomputing, vol. 420, pp. 124–148, 2021

  26. [26]

    Infrared small target detection based on partial sum of the tensor nuclear norm,

    L. Zhang and Z. Peng, “Infrared small target detection based on partial sum of the tensor nuclear norm,”Remote Sensing, vol. 11, no. 4, p. 382, 2019

  27. [27]

    Infrared small target detection via non-convex tensor rank surrogate joint local contrast energy,

    X. Guan, L. Zhang, S. Huang, and Z. Peng, “Infrared small target detection via non-convex tensor rank surrogate joint local contrast energy,”Remote Sensing, vol. 12, no. 9, p. 1520, 2020

  28. [28]

    Infrared small target detection via nonconvex tensor fibered rank approximation,

    X. Kong, C. Yang, S. Cao, C. Li, and Z. Peng, “Infrared small target detection via nonconvex tensor fibered rank approximation,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–21, 2021

  29. [29]

    Small target detection in infrared videos based on spatio-temporal tensor model,

    H.-K. Liu, L. Zhang, and H. Huang, “Small target detection in infrared videos based on spatio-temporal tensor model,”IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 12, pp. 8689–8700, 2020

  30. [30]

    Nonconvex tensor low-rank approximation for infrared small target detection,

    T. Liu, J. Yang, B. Li, C. Xiao, Y . Sun, Y . Wang, and W. An, “Nonconvex tensor low-rank approximation for infrared small target detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2021

  31. [31]

    Representative coefficient total variation for efficient infrared small target detection,

    T. Liu, J. Yang, B. Li, Y . Wang, and W. An, “Representative coefficient total variation for efficient infrared small target detection,”IEEE Trans- actions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023

  32. [32]

    Sparse regularization-based spatial–temporal twist tensor model for infrared small target detection,

    J. Li, P. Zhang, L. Zhang, and Z. Zhang, “Sparse regularization-based spatial–temporal twist tensor model for infrared small target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 17, 2023

  33. [33]

    Infrared small target detection using spatiotemporal 4-d tensor train and ring unfolding,

    F. Wu, H. Yu, A. Liu, J. Luo, and Z. Peng, “Infrared small target detection using spatiotemporal 4-d tensor train and ring unfolding,”IEEE transactions on geoscience and remote sensing, vol. 61, pp. 1–22, 2023

  34. [34]

    Asymmetric contextual modulation for infrared small target detection,

    Y . Dai, Y . Wu, F. Zhou, and K. Barnard, “Asymmetric contextual modulation for infrared small target detection,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 950–959

  35. [35]

    Attention-guided pyramid context networks for detecting infrared small target under complex background,

    T. Zhang, L. Li, S. Cao, T. Pu, and Z. Peng, “Attention-guided pyramid context networks for detecting infrared small target under complex background,”IEEE Transactions on Aerospace and Electronic Systems, vol. 59, no. 4, pp. 4250–4261, 2023

  36. [36]

    Uiu-net: U-net in u-net for infrared small object detection,

    X. Wu, D. Hong, and J. Chanussot, “Uiu-net: U-net in u-net for infrared small object detection,”IEEE Transactions on Image Processing, vol. 32, pp. 364–376, 2022

  37. [37]

    Isnet: Shape matters for infrared small target detection,

    M. Zhang, R. Zhang, Y . Yang, H. Bai, J. Zhang, and J. Guo, “Isnet: Shape matters for infrared small target detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 877–886

  38. [38]

    Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset irdst,

    H. Sun, J. Bai, F. Yang, and X. Bai, “Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset irdst,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13, 2023

  39. [39]

    Towards dense moving infrared small target detection: New datasets and baseline,

    S. Chen, L. Ji, S. Zhu, M. Ye, H. Ren, and Y . Sang, “Towards dense moving infrared small target detection: New datasets and baseline,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  40. [40]

    Dim and small target detection in multi-frame sequence using bi-conv-lstm and 3d-conv structure,

    X. Liu, X. Li, L. Li, X. Su, and F. Chen, “Dim and small target detection in multi-frame sequence using bi-conv-lstm and 3d-conv structure,”Ieee Access, vol. 9, pp. 135 845–135 855, 2021

  41. [41]

    Tsm: Temporal shift module for efficient video understanding,

    J. Lin, C. Gan, and S. Han, “Tsm: Temporal shift module for efficient video understanding,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7083–7093

  42. [42]

    Direction- coded temporal u-shape module for multiframe infrared small target de- tection,

    R. Li, W. An, C. Xiao, B. Li, Y . Wang, M. Li, and Y . Guo, “Direction- coded temporal u-shape module for multiframe infrared small target de- tection,”IEEE Transactions on Neural Networks and Learning Systems, 2023

  43. [43]

    Lmaformer: Local motion aware transformer for small moving infrared target detection,

    Y . Huang, X. Zhi, J. Hu, L. Yu, Q. Han, W. Chen, and W. Zhang, “Lmaformer: Local motion aware transformer for small moving infrared target detection,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  44. [44]

    A low-signal-to-noise ratio infrared small-target detection network,

    F. Li, P. Rao, W. Sun, Y . Su, and X. Chen, “A low-signal-to-noise ratio infrared small-target detection network,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

  45. [45]

    Tmp: Temporal motion perception with spatial auxiliary enhancement for moving infrared dim- small target detection,

    S. Zhu, L. Ji, J. Zhu, S. Chen, and W. Duan, “Tmp: Temporal motion perception with spatial auxiliary enhancement for moving infrared dim- small target detection,”Expert Systems with Applications, vol. 255, p. 124731, 2024

  46. [46]

    Statetrack: Infrared dim and small multi-target detection and tracking via state feature fusion,

    J. Bian, S. Lin, D. Li, and X. Lu, “Statetrack: Infrared dim and small multi-target detection and tracking via state feature fusion,”Infrared Physics & Technology, p. 105954, 2025

  47. [47]

    Moving infrared dim and small target detection by mixed spatio-temporal encoding,

    S. Peng, L. Ji, S. Chen, W. Duan, and S. Zhu, “Moving infrared dim and small target detection by mixed spatio-temporal encoding,”Engineering Applications of Artificial Intelligence, vol. 144, p. 110100, 2025

  48. [48]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  49. [49]

    3d u-net: learning dense volumetric segmentation from sparse anno- tation,

    ¨O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse anno- tation,” inInternational conference on medical image computing and computer-assisted intervention. Springer, 2016, pp. 424–432

  50. [50]

    Deformable convolutional networks,

    J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei, “Deformable convolutional networks,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 764–773

  51. [51]

    Batching soft iou for training semantic segmentation networks,

    Y . Huang, Z. Tang, D. Chen, K. Su, and C. Chen, “Batching soft iou for training semantic segmentation networks,”IEEE Signal Processing Letters, vol. 27, pp. 66–70, 2019

  52. [52]

    Dense nested attention network for infrared small target detection,

    B. Li, C. Xiao, L. Wang, Y . Wang, Z. Lin, M. Li, W. An, and Y . Guo, “Dense nested attention network for infrared small target detection,” IEEE Transactions on Image Processing, vol. 32, pp. 1745–1758, 2022. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 14

  53. [53]

    Attentional local contrast networks for infrared small target detection,

    Y . Dai, Y . Wu, F. Zhou, and K. Barnard, “Attentional local contrast networks for infrared small target detection,”IEEE transactions on geoscience and remote sensing, vol. 59, no. 11, pp. 9813–9824, 2021

  54. [54]

    Istdu-net: Infrared small-target detection u-net,

    Q. Hou, L. Zhang, F. Tan, Y . Xi, H. Zheng, and N. Li, “Istdu-net: Infrared small-target detection u-net,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

  55. [55]

    Rpcanet: Deep unfolding rpca based infrared small target detection,

    F. Wu, T. Zhang, L. Li, Y . Huang, and Z. Peng, “Rpcanet: Deep unfolding rpca based infrared small target detection,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 4809–4818

  56. [56]

    Saliency at the helm: Steering infrared small target detection with learnable kernels,

    F. Wu, A. Liu, T. Zhang, L. Zhang, J. Luo, and Z. Peng, “Saliency at the helm: Steering infrared small target detection with learnable kernels,” IEEE Transactions on Geoscience and Remote Sensing, 2024

  57. [57]

    Imnn-lwec: A novel infrared small target detection based on spatial–temporal tensor model,

    Y . Luo, X. Li, S. Chen, C. Xia, and L. Zhao, “Imnn-lwec: A novel infrared small target detection based on spatial–temporal tensor model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1– 22, 2022. VI. BIOGRAPHYSECTION Yian Huangreceived the B.S. degree in commu- nication engineering from Sun Yat-sen University (SYSU), Guangzhou, China,...

  58. [58]

    degree with the School of Information and Communication En- gineering from University of Electronic Science and Technology of China (UESTC), Chengdu, China

    He is currently pursuing the M.S. degree with the School of Information and Communication En- gineering from University of Electronic Science and Technology of China (UESTC), Chengdu, China. His research interests include computer vision, large language model and infrared target recognition. Xiangyu Qiureceived his B.E. degree from the school of Informati...