pith. sign in

arxiv: 2508.10678 · v2 · pith:JZ3PHMO4new · submitted 2025-08-14 · 💻 cs.CV

HyperTea: A Hypergraph-based Temporal Enhancement and Alignment Network for Moving Infrared Small Target Detection

Pith reviewed 2026-05-21 22:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords moving infrared small target detectionhypergraph neural networkstemporal enhancementfeature alignmentCNN-RNN integrationMIRSTDspatiotemporal correlations
0
0 comments X

The pith

HyperTea integrates hypergraphs with CNNs and RNNs to model high-order spatiotemporal correlations for moving infrared small target detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HyperTea to tackle the challenges of small size, weak intensity, and complex motions in moving infrared small target detection. It combines global temporal enhancement through semantic aggregation, local motion pattern capture between frames, and cross-scale alignment to improve feature representation at multiple timescales. A sympathetic reader would care because existing methods rely on low-order correlations that often fail in practical surveillance or defense scenarios, and this approach claims to deliver state-of-the-art results on standard benchmarks by capturing richer relations via hypergraphs.

Core claim

HyperTea is the first network to fuse CNNs for spatial features, RNNs for sequential context, and hypergraph neural networks for high-order spatiotemporal correlations in MIRSTD. The architecture uses a global temporal enhancement module to aggregate and propagate semantic context across the sequence, a local temporal enhancement module to model motion between adjacent frames, and a temporal alignment module to correct cross-scale feature misalignment, resulting in superior detection performance on the DAUB and IRDST datasets.

What carries the argument

HyperTea architecture with global temporal enhancement module (GTEM) for semantic aggregation and propagation, local temporal enhancement module (LTEM) for adjacent-frame motion patterns, and temporal alignment module (TAM) for cross-scale correction, all built on hypergraph neural networks to model high-order feature correlations.

If this is right

  • Detection accuracy improves for targets with irregular trajectories by explicitly modeling relations beyond pairwise frame connections.
  • Multi-timescale feature enhancement reduces missed detections in low-signal infrared video.
  • Cross-scale alignment mitigates errors when combining global and local temporal information.
  • The combined CNN-RNN-HGNN pipeline sets a new performance baseline on existing MIRSTD benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hypergraph temporal modules could be tested on visible-light small-object tracking to check if the high-order benefit transfers across modalities.
  • If the alignment module proves critical, it might be adapted as a lightweight plug-in for other multi-scale video networks.
  • Real-world deployment would require checking whether the added hypergraph computation remains feasible under strict latency constraints typical of infrared sensors.

Load-bearing premise

High-order spatiotemporal correlations from hypergraphs applied to CNN-extracted features will consistently outperform lower-order temporal models when handling complex motion patterns of small infrared targets.

What would settle it

Running the model on a new infrared sequence dataset containing highly erratic or non-smooth target motions and finding that detection precision or recall drops below that of a strong RNN-only or graph-convolution baseline would falsify the core performance claim.

Figures

Figures reproduced from arXiv: 2508.10678 by Jie Tang, Weihua Gao, Wenlong Niu, Xiaodong Peng, Yun Li, Zhaoyuan Qi.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework HyperTea. Our HyperTea consists of a backbone and three key modules: the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Simplified workflow of our HyperTea. It contains the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Details of our proposed CSAM. them can be captured, rather than being confined solely to the current temporal scale. In addition, with the aim of preserving more original infor￾mation, we also incorporate residual blocks to embed keyframe features at different temporal scales into the query results R: R = LN(GT + UP(LN(Conv1×1(Attn) + Lˆ st))) (14) where LN is layer normalization, UP denotes the reconstruc… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization comparisons of 14 methods on IRDST, with 72/266.bmp. GT is ground truth. Red and blue boxes represent [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization comparisons of 14 methods on IRDST, with 6/14.bmp. GT is ground truth. Red and blue boxes represent [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization comparisons of 14 methods on DAUB, with 21/433.bmp. GT is ground truth. Red and blue boxes represent [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: PR curves of 16 representative detection methods on [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effects of time window size T on HyperTea. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

In practical application scenarios, moving infrared small target detection (MIRSTD) remains highly challenging due to the target's small size, weak intensity, and complex motion pattern. Existing methods typically only model low-order correlations between feature nodes and perform feature extraction and enhancement within a single temporal scale. Although hypergraphs have been widely used for high-order correlation learning, they have received limited attention in MIRSTD. To explore the potential of hypergraphs and enhance multi-timescale feature representation, we propose HyperTea, which integrates global and local temporal perspectives to effectively model high-order spatiotemporal correlations of features. HyperTea consists of three modules: the global temporal enhancement module (GTEM) realizes global temporal context enhancement through semantic aggregation and propagation; the local temporal enhancement module (LTEM) is designed to capture local motion patterns between adjacent frames and then enhance local temporal context; additionally, we further develop a temporal alignment module (TAM) to address potential cross-scale feature misalignment. To our best knowledge, HyperTea is the first work to integrate convolutional neural networks (CNNs), recurrent neural networks (RNNs), and hypergraph neural networks (HGNNs) for MIRSTD, significantly improving detection performance. Experiments on DAUB and IRDST demonstrate its state-of-the-art (SOTA) performance. Our source codes are available at https://github.com/Lurenjia-LRJ/HyperTea.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes HyperTea, a network for moving infrared small target detection (MIRSTD) that integrates CNN feature extraction, RNN-based temporal modeling, and HGNNs for high-order spatiotemporal correlations. It introduces three modules: GTEM for global temporal context enhancement via semantic aggregation and propagation, LTEM to capture local motion patterns between adjacent frames, and TAM to address cross-scale feature misalignment. The authors claim this is the first such CNN-RNN-HGNN integration for the task and report SOTA performance on the DAUB and IRDST datasets, with code released.

Significance. If the hypergraph components demonstrably outperform strong low-order temporal baselines on the same backbone, the work could advance MIRSTD by showing the value of high-order correlation modeling for complex target motions. The open-source code is a clear strength for reproducibility.

major comments (2)
  1. [Experiments] Experimental section / ablation studies: no controls replace the hypergraph modules (GTEM/LTEM) with strong low-order alternatives such as multi-head self-attention or advanced multi-scale RNN variants on the identical CNN backbone. Without these, gains cannot be attributed specifically to high-order hyperedge modeling rather than general temporal enhancement, undermining the central motivation.
  2. [Method] §3.2 and §3.3 (GTEM and LTEM descriptions): the claim that hypergraphs reliably capture high-order correlations outperforming lower-order modeling is asserted in the motivation but not isolated via targeted ablations or comparisons; this is load-bearing for the novelty of the HGNN integration.
minor comments (2)
  1. [Abstract] Abstract: reports SOTA without any numerical margins, dataset-specific metrics, or baseline comparisons; adding one sentence with key improvements (e.g., mAP or Pd/Fa deltas) would improve clarity.
  2. [Method] Hypergraph construction: details on how hyperedges are formed from CNN features (including any temporal scale hyperparameters) are insufficiently specified for exact reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and will revise the manuscript to strengthen the experimental validation of the hypergraph components.

read point-by-point responses
  1. Referee: [Experiments] Experimental section / ablation studies: no controls replace the hypergraph modules (GTEM/LTEM) with strong low-order alternatives such as multi-head self-attention or advanced multi-scale RNN variants on the identical CNN backbone. Without these, gains cannot be attributed specifically to high-order hyperedge modeling rather than general temporal enhancement, undermining the central motivation.

    Authors: We agree that the current ablations do not fully isolate the contribution of high-order hyperedge modeling. In the revised version we will add new experiments that replace the GTEM and LTEM hypergraph modules with strong low-order baselines (multi-head self-attention and advanced multi-scale RNN variants) while keeping the identical CNN backbone and all other components fixed. These results will be reported in an expanded ablation table. revision: yes

  2. Referee: [Method] §3.2 and §3.3 (GTEM and LTEM descriptions): the claim that hypergraphs reliably capture high-order correlations outperforming lower-order modeling is asserted in the motivation but not isolated via targeted ablations or comparisons; this is load-bearing for the novelty of the HGNN integration.

    Authors: We acknowledge that the motivation section asserts the benefit of high-order modeling without dedicated isolation experiments. We will insert targeted ablation studies in the revision that directly compare hypergraph-based GTEM/LTEM against their low-order counterparts on the same backbone, thereby providing empirical support for the novelty claim of the CNN-RNN-HGNN integration. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal evaluated on external datasets

full rationale

The paper proposes HyperTea as an integration of CNNs, RNNs, and HGNNs with modules GTEM, LTEM, and TAM to model high-order spatiotemporal correlations for MIRSTD. Claims of novelty and SOTA performance rest on experiments using public external datasets (DAUB, IRDST) rather than any self-referential equations or fitted parameters. No derivation reduces reported metrics to inputs by construction, and no load-bearing self-citations or uniqueness theorems from prior author work are invoked in the provided text. The central contribution is an empirical architectural design whose validity is tested independently of its own definitions.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 3 invented entities

The method rests on standard deep-learning assumptions plus the domain-specific premise that hypergraphs are an effective inductive bias for infrared motion; no new physical entities are postulated.

free parameters (2)
  • hypergraph construction hyperparameters
    Number of hyperedges, vertex grouping strategy, and aggregation weights are chosen during architecture design and training.
  • temporal scale parameters
    Window sizes for global versus local modules and alignment offsets are selected to fit the target motion statistics.
axioms (2)
  • domain assumption High-order correlations among feature nodes improve detection of complex motion patterns over pairwise modeling
    Invoked in the introduction when motivating hypergraphs for MIRSTD.
  • domain assumption Cross-scale feature misalignment can be corrected by a dedicated alignment module without introducing new artifacts
    Stated as motivation for the TAM module.
invented entities (3)
  • Global Temporal Enhancement Module (GTEM) no independent evidence
    purpose: Semantic aggregation and propagation across the entire sequence
    New architectural component introduced to realize global temporal context.
  • Local Temporal Enhancement Module (LTEM) no independent evidence
    purpose: Capture local motion patterns between adjacent frames
    New architectural component for local temporal context.
  • Temporal Alignment Module (TAM) no independent evidence
    purpose: Address potential cross-scale feature misalignment
    New component to keep multi-timescale features consistent.

pith-pipeline@v0.9.0 · 5797 in / 1614 out tokens · 34004 ms · 2026-05-21T22:47:28.852578+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    Infrared small target segmentation networks: A survey,

    R. Kou, C. Wang, Z. Peng, Z. Zhao, Y . Chen, J. Han, F. Huang, Y . Yu, and Q. Fu, “Infrared small target segmentation networks: A survey,” Pattern Recognition, vol. 143, p. 109788, Nov. 2023

  2. [2]

    A local contrast method for small infrared target detection,

    C. L. P. Chen, H. Li, Y . Wei, T. Xia, and Y . Y . Tang, “A local contrast method for small infrared target detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 52, no. 1, pp. 574–581, Jan. 2014

  3. [3]

    The design of top-hat morphological filter and application to infrared target detection,

    M. Zeng, J. Li, and Z. Peng, “The design of top-hat morphological filter and application to infrared target detection,” Infrared Physics & Technology, vol. 48, no. 1, pp. 67–76, Apr. 2006

  4. [4]

    Infrared patch-image model for small target detection in a single image,

    C. Gao, D. Meng, Y . Yang, Y . Wang, X. Zhou, and A. G. Hauptmann, “Infrared patch-image model for small target detection in a single image,” IEEE Transactions on Image Processing , vol. 22, no. 12, pp. 4996–5009, Dec. 2013

  5. [5]

    Infrared small target detection based on partial sum of the tensor nuclear norm,

    L. Zhang and Z. Peng, “Infrared small target detection based on partial sum of the tensor nuclear norm,” Remote Sensing, vol. 11, no. 4, p. 382, Jan. 2019

  6. [6]

    Dim small target detection and tracking: A novel method based on temporal energy selective scaling and trajectory association,

    W. Gao, W. Niu, W. Lu, P. Wang, Z. Qi, X. Peng, and Z. Yang, “Dim small target detection and tracking: A novel method based on temporal energy selective scaling and trajectory association,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 17, pp. 17 239–17 262

  7. [7]

    Asymmetric contextual modulation for infrared small target detection,

    Y . Dai, Y . Wu, F. Zhou, and K. Barnard, “Asymmetric contextual modulation for infrared small target detection,” in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2021, pp. 949–958

  8. [8]

    Dense nested attention network for infrared small target detection,

    B. Li, C. Xiao, L. Wang, Y . Wang, Z. Lin, M. Li, W. An, and Y . Guo, “Dense nested attention network for infrared small target detection,” IEEE Transactions on Image Processing, vol. 32, pp. 1745–1758, 2023

  9. [9]

    Sstnet: Sliced spatio- temporal network with cross-slice convlstm for moving infrared dim- small target detection,

    S. Chen, L. Ji, J. Zhu, M. Ye, and X. Yao, “Sstnet: Sliced spatio- temporal network with cross-slice convlstm for moving infrared dim- small target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024

  10. [10]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV , USA: IEEE, Jun. 2016, pp. 770– 778

  11. [11]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017

  12. [12]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” Berlin, Heidelberg, Aug. 2020, pp. 213–229

  13. [13]

    An image is worth 16x16 words: Trans- formers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” in International Conference on Learning Representations, Oct. 2020

  14. [14]

    Video swin transformer,

    Z. Liu, J. Ning, Y . Cao, Y . Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 3202–3211

  15. [15]

    Convolutional lstm network: A machine learning approach for precipitation nowcasting,

    X. SHI, Z. Chen, H. Wang, D.-Y . Yeung, W.-k. Wong, and W.-c. WOO, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

  16. [16]

    Uiu-net: U-net in u-net for infrared small object detection,

    X. Wu, D. Hong, and J. Chanussot, “Uiu-net: U-net in u-net for infrared small object detection,”IEEE Transactions on Image Processing, vol. 32, pp. 364–376, 2023

  17. [17]

    Attention-guided pyramid context networks for detecting infrared small target under complex background,

    T. Zhang, L. Li, S. Cao, T. Pu, and Z. Peng, “Attention-guided pyramid context networks for detecting infrared small target under complex background,” IEEE Transactions on Aerospace and Electronic Systems , vol. 59, no. 4, pp. 4250–4261, Aug. 2023

  18. [18]

    Sctransnet: Spatial- channel cross transformer network for infrared small target detection,

    S. Yuan, H. Qin, X. Yan, N. Akhtar, and A. Mian, “Sctransnet: Spatial- channel cross transformer network for infrared small target detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 62, pp. 1– 15, 2024. 12

  19. [19]

    St- trans: Spatial-temporal transformer for infrared small target detection in sequential images,

    X. Tong, Z. Zuo, S. Su, J. Wei, X. Sun, P. Wu, and Z. Zhao, “St- trans: Spatial-temporal transformer for infrared small target detection in sequential images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–19, 2024

  20. [20]

    Ir-transdet: Infrared dim and small target detection with ir-transformer,

    J. Lin, S. Li, L. Zhang, X. Yang, B. Yan, and Z. Meng, “Ir-transdet: Infrared dim and small target detection with ir-transformer,” IEEE Transactions on Geoscience and Remote Sensing , vol. 61, pp. 1–13, 2023

  21. [21]

    Toward dense moving infrared small target detection: New datasets and baseline,

    S. Chen, L. Ji, S. Zhu, M. Ye, H. Ren, and Y . Sang, “Toward dense moving infrared small target detection: New datasets and baseline,”IEEE Transactions on Geoscience and Remote Sensing , vol. 62, pp. 1–13, 2024

  22. [22]

    Hgnn+: General hypergraph neural networks,

    Y . Gao, Y . Feng, S. Ji, and R. Ji, “Hgnn+: General hypergraph neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 45, no. 3, pp. 3181–3199, Mar. 2023

  23. [23]

    Hyper-yolo: When visual object detection meets hypergraph computation,

    Y . Feng, J. Huang, S. Du, S. Ying, J.-H. Yong, Y . Li, G. Ding, R. Ji, and Y . Gao, “Hyper-yolo: When visual object detection meets hypergraph computation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–14, 2025

  24. [24]

    Hypergraph learning: Methods and practices,

    Y . Gao, Z. Zhang, H. Lin, X. Zhao, S. Du, and C. Zou, “Hypergraph learning: Methods and practices,”IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 44, no. 5, pp. 2548–2566, 2022

  25. [25]

    Lbsn2vec++: Het- erogeneous hypergraph embedding for location-based social networks,

    D. Yang, B. Qu, J. Yang, and P. Cudr ´e-Mauroux, “Lbsn2vec++: Het- erogeneous hypergraph embedding for location-based social networks,” IEEE Transactions on Knowledge and Data Engineering , vol. 34, no. 4, pp. 1843–1855, 2022

  26. [26]

    Hypergraph factorization for multi-tissue gene expression imputation,

    R. Vi ˜nas, C. K. Joshi, D. Georgiev, P. Lin, B. Dumitrascu, E. R. Gamazon, and P. Li `o, “Hypergraph factorization for multi-tissue gene expression imputation,” Nature Machine Intelligence , vol. 5, no. 7, pp. 739–753, Jul. 2023

  27. [27]

    Multi-hypergraph learning-based brain functional connectivity analysis in fmri data,

    L. Xiao, J. Wang, P. H. Kassani, Y . Zhang, Y . Bai, J. M. Stephen, T. W. Wilson, V . D. Calhoun, and Y .-P. Wang, “Multi-hypergraph learning-based brain functional connectivity analysis in fmri data,” IEEE Transactions on Medical Imaging , vol. 39, no. 5, pp. 1746–1758, May 2020

  28. [28]

    Max-mean and max-median filters for detection of small targets,

    S. D. Deshpande, M. H. Er, R. Venkateswarlu, and P. Chan, “Max-mean and max-median filters for detection of small targets,” in SPIE’s Interna- tional Symposium on Optical Science, Engineering, and Instrumentation, Denver, CO, Oct. 1999, pp. 74–83

  29. [29]

    Infrared small target detection utilizing the multiscale relative local contrast measure,

    J. Han, K. Liang, B. Zhou, X. Zhu, J. Zhao, and L. Zhao, “Infrared small target detection utilizing the multiscale relative local contrast measure,” IEEE Geoscience and Remote Sensing Letters , vol. 15, no. 4, pp. 612– 616, Apr. 2018

  30. [30]

    Infrared small target detection based on the weighted strengthened local contrast measure,

    J. Han, S. Moradi, I. Faramarzi, H. Zhang, Q. Zhao, X. Zhang, and N. Li, “Infrared small target detection based on the weighted strengthened local contrast measure,” IEEE Geoscience and Remote Sensing Letters , vol. 18, no. 9, pp. 1670–1674, Sep. 2021

  31. [31]

    A local contrast method for infrared small-target detection utilizing a tri-layer window,

    J. Han, S. Moradi, I. Faramarzi, C. Liu, H. Zhang, and Q. Zhao, “A local contrast method for infrared small-target detection utilizing a tri-layer window,”IEEE Geoscience and Remote Sensing Letters , vol. 17, no. 10, pp. 1822–1826, Oct. 2020

  32. [32]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 , Cham, 2015, pp. 234– 241

  33. [33]

    Isnet: Shape matters for infrared small target detection,

    M. Zhang, R. Zhang, Y . Yang, H. Bai, J. Zhang, and J. Guo, “Isnet: Shape matters for infrared small target detection,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun. 2022, pp. 867–876

  34. [34]

    Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset irdst,

    H. Sun, J. Bai, F. Yang, and X. Bai, “Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset irdst,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13, 2023

  35. [35]

    Rkformer: Runge-kutta transformer with random-connection attention for infrared small target detection,

    M. Zhang, H. Bai, J. Zhang, R. Zhang, C. Wang, J. Guo, and X. Gao, “Rkformer: Runge-kutta transformer with random-connection attention for infrared small target detection,” in Proceedings of the 30th ACM International Conference on Multimedia , New York, NY , USA, Oct. 2022, pp. 1730–1738

  36. [36]

    Abmnet: Coupling transformer with cnn based on adams-bashforth-moulton method for infrared small target detection,

    T. Chen, Q. Chu, Z. Tan, B. Liu, and N. Yu, “Abmnet: Coupling transformer with cnn based on adams-bashforth-moulton method for infrared small target detection,” in 2023 IEEE International Conference on Multimedia and Expo (ICME) , Brisbane, Australia, Jul. 2023, pp. 1901–1906

  37. [37]

    Monte carlo linear clustering with single-point supervision is enough for infrared small target detection,

    B. Li, Y . Wang, L. Wang, F. Zhang, T. Liu, Z. Lin, W. An, and Y . Guo, “Monte carlo linear clustering with single-point supervision is enough for infrared small target detection,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV) , Oct. 2023, pp. 1009–1019

  38. [38]

    Mapping degeneration meets label evolution: Learning in- frared small target detection with single point supervision,

    X. Ying, L. Liu, Y . Wang, R. Li, N. Chen, Z. Lin, W. Sheng, and S. Zhou, “Mapping degeneration meets label evolution: Learning in- frared small target detection with single point supervision,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, Jun. 2023, pp. 15 528–15 538

  39. [39]

    Label evolution based on local contrast measure for single-point supervised infrared small-target detection,

    D. Yang, H. Zhang, Y . Li, and Z. Jiang, “Label evolution based on local contrast measure for single-point supervised infrared small-target detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 62, pp. 1–12, 2024

  40. [40]

    A level set annotation framework with single-point supervision for infrared small target detection,

    H. Li, J. Yang, Y . Xu, and R. Wang, “A level set annotation framework with single-point supervision for infrared small target detection,” IEEE Signal Processing Letters , vol. 31, pp. 451–455, 2024

  41. [41]

    Mcgc: A multiscale chain growth clustering algorithm for generating infrared small target mask under single-point supervision,

    R. Kou, C. Wang, Q. Fu, Z. Li, Y . Luo, B. Li, W. Li, and Z. Peng, “Mcgc: A multiscale chain growth clustering algorithm for generating infrared small target mask under single-point supervision,” IEEE Transactions on Geoscience and Remote Sensing , vol. 62, pp. 1–12, 2024

  42. [42]

    Point-to-point regression: Accurate infrared small target detection with single-point annotation,

    R. Ni, J. Wu, Z. Qiu, L. Chen, C. Luo, F. Huang, Q. Liu, B. Wang, Y . Li, and Y . Li, “Point-to-point regression: Accurate infrared small target detection with single-point annotation,” IEEE Transactions on Geoscience and Remote Sensing , vol. 63, pp. 1–19, 2025

  43. [43]

    Sirst-5k: Exploring massive negatives synthesis with self-supervised learning for robust infrared small target detection,

    Y . Lu, Y . Lin, H. Wu, X. Xian, Y . Shi, and L. Lin, “Sirst-5k: Exploring massive negatives synthesis with self-supervised learning for robust infrared small target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–11, 2024

  44. [44]

    Mim-istd: Mamba-in-mamba for efficient infrared small- target detection,

    T. Chen, Z. Ye, Z. Tan, T. Gong, Y . Wu, Q. Chu, B. Liu, N. Yu, and J. Ye, “Mim-istd: Mamba-in-mamba for efficient infrared small- target detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024

  45. [45]

    Irsam: Advancing segment anything model for infrared small target detection,

    M. Zhang, Y . Wang, J. Guo, Y . Li, X. Gao, and J. Zhang, “Irsam: Advancing segment anything model for infrared small target detection,” in ECCV 2024, Cham, 2025, vol. 15125, pp. 233–249

  46. [46]

    Stdmanet: Spatio-temporal differential multiscale attention network for small mov- ing infrared target detection,

    P. Yan, R. Hou, X. Duan, C. Yue, X. Wang, and X. Cao, “Stdmanet: Spatio-temporal differential multiscale attention network for small mov- ing infrared target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023

  47. [47]

    Direction- coded temporal u-shape module for multiframe infrared small target de- tection,

    R. Li, W. An, C. Xiao, B. Li, Y . Wang, M. Li, and Y . Guo, “Direction- coded temporal u-shape module for multiframe infrared small target de- tection,” IEEE Transactions on Neural Networks and Learning Systems , pp. 1–14, 2025

  48. [48]

    Tmp: Temporal motion perception with spatial auxiliary enhancement for moving infrared dim- small target detection,

    S. Zhu, L. Ji, J. Zhu, S. Chen, and W. Duan, “Tmp: Temporal motion perception with spatial auxiliary enhancement for moving infrared dim- small target detection,” Expert Systems with Applications , vol. 255, p. 124731, Dec. 2024

  49. [49]

    Triple-domain feature learning with frequency-aware memory enhancement for moving in- frared small target detection,

    W. Duan, L. Ji, S. Chen, S. Zhu, and M. Ye, “Triple-domain feature learning with frequency-aware memory enhancement for moving in- frared small target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

  50. [50]

    Semi- supervised multiview prototype learning with motion reconstruction for moving infrared small target detection,

    W. Duan, L. Ji, J. Huang, S. Chen, S. Peng, S. Zhu, and M. Ye, “Semi- supervised multiview prototype learning with motion reconstruction for moving infrared small target detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 63, pp. 1–15, 2025

  51. [51]

    Motion prior knowledge learning with homogeneous language descriptions for moving infrared small target detection,

    S. Chen, L. Ji, W. Duan, S. Peng, and M. Ye, “Motion prior knowledge learning with homogeneous language descriptions for moving infrared small target detection,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 2, pp. 2186–2194, Apr. 2025

  52. [52]

    Yolox: Exceeding yolo series in 2021,

    Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” Aug. 2021

  53. [53]

    Predrnn: A recurrent neural network for spatiotemporal predictive learning,

    Y . Wang, H. Wu, J. Zhang, Z. Gao, J. Wang, P. S. Yu, and M. Long, “Predrnn: A recurrent neural network for spatiotemporal predictive learning,” IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 45, no. 2, pp. 2208–2225, Feb. 2023

  54. [54]

    Swinlstm: Improving spa- tiotemporal prediction accuracy using swin transformer and lstm,

    S. Tang, C. Li, P. Zhang, and R. Tang, “Swinlstm: Improving spa- tiotemporal prediction accuracy using swin transformer and lstm,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 13 470–13 479

  55. [55]

    Attentional local contrast networks for infrared small target detection,

    Y . Dai, Y . Wu, F. Zhou, and K. Barnard, “Attentional local contrast networks for infrared small target detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 59, no. 11, pp. 9813–9824, Nov. 2021

  56. [56]

    Mtu-net: Multilevel transunet for space-based infrared tiny ship detection,

    T. Wu, B. Li, Y . Luo, Y . Wang, C. Xiao, T. Liu, J. Yang, W. An, and Y . Guo, “Mtu-net: Multilevel transunet for space-based infrared tiny ship detection,” IEEE Transactions on Geoscience and Remote Sensing , vol. 61, pp. 1–15, 2023

  57. [57]

    Infrared small target detection with scale and location sensitivity,

    Q. Liu, R. Liu, B. Zheng, H. Wang, and Y . FU, “Infrared small target detection with scale and location sensitivity,” in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17 490–17 499. 13

  58. [58]

    Rpcanet: Deep unfolding rpca based infrared small target detection,

    F. Wu, T. Zhang, L. Li, Y . Huang, and Z. Peng, “Rpcanet: Deep unfolding rpca based infrared small target detection,” in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2024, pp. 4797–4806

  59. [59]

    Pick of the bunch: Detecting infrared small targets beyond hit-miss trade-offs via selective rank-aware attention,

    Y . Dai, P. Pan, Y . Qian, Y . Li, X. Li, J. Yang, and H. Wang, “Pick of the bunch: Detecting infrared small targets beyond hit-miss trade-offs via selective rank-aware attention,” IEEE Transactions on Geoscience and Remote Sensing , vol. 62, pp. 1–15, 2024

  60. [60]

    A dataset for infrared image dim-small aircraft target detection and tracking under ground / air background,

    B. Hui, Z. Song, H. Fan, P. Zhong, W. Hu, X. Zhang, J. Lin, H. Su, W. Jin, Y . Zhang, and Y . Bai, “A dataset for infrared image dim-small aircraft target detection and tracking under ground / air background,” Oct. 2019