pith. machine review for the scientific record. sign in

arxiv: 2601.09240 · v2 · submitted 2026-01-14 · 💻 cs.CV · eess.IV

Recognition: 1 theorem link

· Lean Theorem

DeTracker: Motion-decoupled Vehicle Detection and Tracking in Unstabilized Satellite Videos

Authors on Pith no claims yet

Pith reviewed 2026-05-16 14:40 UTC · model grok-4.3

classification 💻 cs.CV eess.IV
keywords vehicle trackingsatellite videomotion decouplingmulti-object trackingunstabilized videotiny object detectiontemporal feature fusion
0
0 comments X

The pith

DeTracker decouples dominant platform motion from weak target motion to track tiny vehicles in jittery satellite videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops DeTracker as a joint detection and tracking system for satellite videos where platform jitter overwhelms the faint signals of small moving vehicles. Its Global-Local Motion Decoupling module aligns features globally to cancel background motion while refining local patches to preserve target trajectories. A Temporal Dependency Feature Pyramid fuses information across frames to strengthen the representation of tiny objects whose appearance alone is unreliable. The authors also release SDM-Car-SU, a benchmark that injects controlled multi-directional and multi-speed platform motions into simulated scenes. On both this dataset and real unstabilized satellite sequences the method records higher MOTA scores than prior trackers.

Core claim

DeTracker achieves motion decoupling in unstabilized satellite videos by using global semantic alignment at the feature level to suppress dominant platform motion and local refinement to capture target-specific motion, together with cross-frame temporal fusion that improves the continuity of weak vehicle signals, resulting in more stable trajectories and consistent identities.

What carries the argument

The Global-Local Motion Decoupling (GLMD) module, which suppresses background-dominated motion via global semantic alignment and captures target motion through local refinement.

If this is right

  • Trajectory stability increases when global alignment removes the bulk of platform-induced displacement before local tracking begins.
  • Identity consistency improves for vehicles whose appearance is too weak to rely on alone.
  • Temporal feature fusion across frames raises the discriminability of tiny objects under varying motion speeds.
  • The SDM-Car-SU benchmark allows direct measurement of robustness across different jitter directions and velocities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same global-local split could be tested on drone footage or handheld camera sequences where camera shake dominates small-object motion.
  • If feature-level alignment proves sufficient, future work might replace explicit stabilization preprocessing with this learned decoupling step.
  • The approach suggests that any tracking task with a strong background motion bias may benefit from explicit semantic suppression before local association.

Load-bearing premise

The controlled platform motions added to the SDM-Car-SU dataset match the statistical structure of real satellite jitter without discarding useful target signals.

What would settle it

Real unstabilized satellite sequences in which DeTracker produces more identity switches or trajectory breaks than a baseline that does not separate global and local motion.

Figures

Figures reproduced from arXiv: 2601.09240 by Jiajun Chen, Jing Xiao, Jun Pan, Liang Liao, Mi Wang, Shaohan Cao, Yuming Zhu.

Figure 1
Figure 1. Figure 1: Major challenges of object tracking in unstabilized satellite videos. (a) Motion decoupling ambiguity: platform-induced jitter introduces complex [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of inter-frame motion and annotation mapping under unstabilized conditions. The left columns present the frame-wise annotation mapping [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed DeTracker. The framework consists of three components: (1) a feature extraction backbone for acquiring multi-scale spatial [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The structure of the GLMD module comprises a global alignment [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of detection results for vehicles moving in different directions. From the first line to the third line are frames 50, 120, and 189 respectively [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of tracking trajectories, where the motion trajectory of each object is shown in a distinct color within the same frame. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of the GLMD effect. Before alignment, the feature difference map between two consecutive frames contains many highlighted regions, [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Feature map visualization results after incorporating different modules. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

Satellite videos provide continuous observations of surface dynamics but pose significant challenges for multi-object tracking (MOT), especially under unstabilized conditions where platform jitter and the weak appearance of tiny objects jointly degrade tracking performance. To address this problem, we propose DeTracker, a joint-detection-and-tracking framework tailored for unstabilized satellite videos. DeTracker introduces a task-driven Global-Local Motion Decoupling (GLMD) module to address the motion imbalance between dominant platform motion and weak target motion. It suppresses background-dominated motion via global semantic alignment at the feature level and captures target-specific motion through local refinement, improving trajectory stability and identity consistency. In addition, a Temporal Dependency Feature Pyramid (TDFP) module is developed to perform cross-frame temporal feature fusion, enhancing the continuity and discriminability of tiny-object representations. We further construct a new benchmark dataset, SDM-Car-SU, which simulates multi-directional and multi-speed platform motions to enable systematic evaluation of tracking robustness under varying motion perturbations. Extensive experiments on both simulated and real unstabilized satellite videos demonstrate that DeTracker significantly outperforms existing methods, achieving 61.1% MOTA on SDM-Car-SU and 45.3% MOTA on real satellite video data. The code and dataset will be publicly available at https://github.com/alex-chenjiajun/DeTracker.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents DeTracker, a joint detection-and-tracking framework for vehicles in unstabilized satellite videos. It introduces a Global-Local Motion Decoupling (GLMD) module that performs global semantic alignment to suppress dominant platform motion while refining local target motion, a Temporal Dependency Feature Pyramid (TDFP) for cross-frame temporal feature fusion, and a new simulated benchmark SDM-Car-SU that injects multi-directional and multi-speed platform motions. Experiments report 61.1% MOTA on SDM-Car-SU and 45.3% MOTA on real unstabilized sequences, with claims of significant outperformance over existing methods.

Significance. If the central claims hold, the work provides a practical engineering advance for MOT under platform jitter and weak target appearance, a setting relevant to satellite-based monitoring. The explicit commitment to release code and the SDM-Car-SU dataset is a clear strength that supports reproducibility and future benchmarking.

major comments (2)
  1. [Dataset Construction] Dataset section: the assertion that SDM-Car-SU faithfully reproduces real unstabilized satellite jitter is load-bearing for both the GLMD module's reported gains and the generalization claim to real data, yet no quantitative matching (e.g., amplitude histograms, frequency spectra, or spatial correlation statistics of optical-flow vectors) between simulated and real sequences is provided.
  2. [Experiments] Experiments section: the headline MOTA figures (61.1% on SDM-Car-SU, 45.3% on real data) are presented without reported ablations isolating GLMD versus TDFP, without baseline implementation details or hyper-parameter settings, and without error bars or statistical significance tests, leaving the robustness of the outperformance claim difficult to evaluate.
minor comments (2)
  1. [Method] Notation in the GLMD description could be clarified by explicitly defining the global alignment loss and the local refinement operator before their first use.
  2. [Qualitative Results] Figure captions for the qualitative results should state the exact frame indices and motion parameters used so readers can replicate the visualized conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Dataset Construction] Dataset section: the assertion that SDM-Car-SU faithfully reproduces real unstabilized satellite jitter is load-bearing for both the GLMD module's reported gains and the generalization claim to real data, yet no quantitative matching (e.g., amplitude histograms, frequency spectra, or spatial correlation statistics of optical-flow vectors) between simulated and real sequences is provided.

    Authors: We agree that quantitative validation would strengthen the claims regarding simulation fidelity. In the revised manuscript, we will add direct comparisons between SDM-Car-SU and real sequences, including amplitude histograms of platform motion, frequency spectra of jitter, and spatial correlation statistics of optical-flow vectors. This will provide explicit evidence supporting the simulation's realism. revision: yes

  2. Referee: [Experiments] Experiments section: the headline MOTA figures (61.1% on SDM-Car-SU, 45.3% on real data) are presented without reported ablations isolating GLMD versus TDFP, without baseline implementation details or hyper-parameter settings, and without error bars or statistical significance tests, leaving the robustness of the outperformance claim difficult to evaluate.

    Authors: We acknowledge the value of additional experimental details for assessing robustness. We will expand the experiments section to include ablations isolating the contributions of GLMD and TDFP, full baseline implementation details with hyper-parameter settings, and results reported with error bars from multiple runs together with statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with held-out validation

full rationale

The paper proposes two new modules (GLMD for global-local motion decoupling via semantic alignment and local refinement, TDFP for cross-frame temporal fusion) and a new simulated dataset SDM-Car-SU to evaluate tracking under multi-directional platform jitter. Performance is reported as empirical MOTA on held-out simulated data (61.1%) and separate real satellite sequences (45.3%). No derivation, equation, or claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the central results are measured outcomes against external test sets rather than tautological outputs of the same inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard deep learning assumptions for feature extraction and motion modeling without introducing new physical entities or free parameters beyond typical network hyperparameters.

axioms (1)
  • domain assumption Feature-level alignment and pyramid fusion can reliably separate dominant platform motion from weak target motion in satellite imagery.
    Invoked in the description of the GLMD and TDFP modules as the basis for improved trajectory stability.

pith-pipeline@v0.9.0 · 5552 in / 1189 out tokens · 55939 ms · 2026-05-16T14:40:57.843448+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages

  1. [1]

    Big data analytics in intelligent transportation systems: A survey,

    L. Zhu, F. R. Yu, Y . Wang, B. Ning, and T. Tang, “Big data analytics in intelligent transportation systems: A survey,”IEEE Trans. Intell. Transp. Syst., vol. 20, no. 1, pp. 383–398, 2019

  2. [2]

    Enhancing emergency response with real-time video analytics for natural disaster management,

    D. H. Sai, J. A, T. Thiyagu, V . K, C. J. R, and S. Kirubakaran S, “Enhancing emergency response with real-time video analytics for natural disaster management,” inProc. 4th Int. Conf. Smart Technol., Commun. Robot., 2025, pp. 1–7

  3. [3]

    Small moving vehicle detection in a satellite video of an urban area,

    T. Yang, X. Wang, B. Yao, J. Li, Y . Zhang, Z. He, and W. Duan, “Small moving vehicle detection in a satellite video of an urban area,”Sensors, vol. 16, no. 9, p. 1528, 2016

  4. [4]

    High-speed tracking with kernelized correlation filters,

    J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, 2015

  5. [5]

    Accurate scale estimation for robust visual tracking,

    M. Danelljan, G. H ¨ager, F. Khan, and M. Felsberg, “Accurate scale estimation for robust visual tracking,” inProc. Brit. Mach. Vis. Conf., 2014

  6. [6]

    Dsfnet: Dynamic and static fusion network for moving object detection in satellite videos,

    C. Xiao, Q. Yin, X. Ying, R. Li, S. Wu, M. Li, L. Liu, W. An, and Z. Chen, “Dsfnet: Dynamic and static fusion network for moving object detection in satellite videos,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

  7. [7]

    Hrtracker: Multi-object tracking in satellite video enhanced by high-resolution feature fusion and an adaptive data association,

    Y . Wu, Q. Liu, H. Sun, and D. Xue, “Hrtracker: Multi-object tracking in satellite video enhanced by high-resolution feature fusion and an adaptive data association,”Remote Sensing, vol. 16, no. 17, 2024

  8. [8]

    Multi-object tracking in satellite videos with graph-based multitask modeling,

    Q. He, X. Sun, Z. Yan, B. Li, and K. Fu, “Multi-object tracking in satellite videos with graph-based multitask modeling,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022

  9. [9]

    Mp2net: Mask propagation and motion prediction network for multiobject tracking in satellite videos,

    M. Zhao, S. Li, H. Wang, J. Yang, Y . Sun, and Y . Gu, “Mp2net: Mask propagation and motion prediction network for multiobject tracking in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024

  10. [10]

    Cftracker: Multi-object tracking with cross-frame connections in satellite videos,

    L. Kong, Z. Yan, Y . Zhang, W. Diao, Z. Zhu, and L. Wang, “Cftracker: Multi-object tracking with cross-frame connections in satellite videos,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023

  11. [11]

    Multiple object tracking in satellite video with graph-based multiclue fusion tracker,

    H. Chen, N. Li, D. Li, J. Lv, W. Zhao, R. Zhang, and J. Xu, “Multiple object tracking in satellite video with graph-based multiclue fusion tracker,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–14, 2024

  12. [12]

    Tabctnet: Target-aware bilateral cnn- transformer network for single object tracking in satellite videos,

    Q. Zhu, X. Huang, and Q. Guan, “Tabctnet: Target-aware bilateral cnn- transformer network for single object tracking in satellite videos,”Int. J. Appl. Earth Obs. Geoinf., vol. 128, p. 103723, 2024

  13. [13]

    Target-aware transformer tracking,

    Y . Zheng, Y . Zhang, and B. Xiao, “Target-aware transformer tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 4542–4551, 2023

  14. [14]

    A twofold siamese network for real-time object tracking,

    A. He, C. Luo, X. Tian, and W. Zeng, “A twofold siamese network for real-time object tracking,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4834–4843

  15. [15]

    Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking,

    H. Zhou, W. Ouyang, J. Cheng, X. Wang, and H. Li, “Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking,”IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1011–1022, 2019

  16. [16]

    One-shot multiple object tracking with robust id preservation,

    W. Lv, N. Zhang, J. Zhang, and D. Zeng, “One-shot multiple object tracking with robust id preservation,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 6, pp. 4473–4488, 2024

  17. [17]

    Fstrack: One-shot multi-object tracking algorithm based on feature enhancement and similarity estimation,

    B. He, L. Yuan, and K. Lv, “Fstrack: One-shot multi-object tracking algorithm based on feature enhancement and similarity estimation,” IEEE Signal Process. Lett., vol. 31, pp. 775–779, 2024

  18. [18]

    Satellite video stabilization method based on global motion consistency,

    M. Li, D. Fan, and G. Chu, “Satellite video stabilization method based on global motion consistency,” in2022 14th International Conference on Signal Processing Systems (ICSPS), 2022, pp. 787–792

  19. [19]

    High-precision satellite video stabilization method based on ed-ransac operator,

    F. Zhang, X. Li, T. Wang, G. Zhang, J. Hong, Q. Cheng, and T. Dong, “High-precision satellite video stabilization method based on ed-ransac operator,”Remote Sensing, vol. 15, no. 12, 2023

  20. [20]

    Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,

    Z. Zhiqi, W. Mi, C. Jinshan, L. Chuang, and L. Dunbo, “Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,”Geomatics Inf. Sci. Wuhan Univ., vol. 49, no. 6, pp. 899–910, 2024

  21. [21]

    Error bounded foreground and background modeling for moving object detection in satellite videos,

    J. Zhang, X. Jia, and J. Hu, “Error bounded foreground and background modeling for moving object detection in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 58, no. 4, pp. 2659–2669, 2020

  22. [22]

    A multitask benchmark dataset for satellite video: Object detection, tracking, and segmentation,

    S. Li, Z. Zhou, M. Zhao, J. Yang, W. Guo, Y . Lv, L. Kou, H. Wang, and Y . Gu, “A multitask benchmark dataset for satellite video: Object detection, tracking, and segmentation,”IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–21, 2023

  23. [23]

    Detecting and tracking small and dense moving objects in satellite videos: A benchmark,

    Q. Yin, Q. Hu, H. Liu, F. Zhang, Y . Wang, Z. Lin, W. An, and Y . Guo, “Detecting and tracking small and dense moving objects in satellite videos: A benchmark,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022

  24. [24]

    Sdm-car: A dataset for small and dim moving vehicles detection in satellite videos,

    Z. Zhang, T. Peng, L. Liao, J. Xiao, and M. Wang, “Sdm-car: A dataset for small and dim moving vehicles detection in satellite videos,”IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024

  25. [25]

    Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,

    Z. Zhang, M. Wang, J. Cao, C. Liu, and D. Liao, “Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,”Geomatics Inf. Sci. Wuhan Univ., vol. 49, no. 6, pp. 899–910, 2024

  26. [26]

    Motion-guided multiobject tracking model for high-speed aerial objects in satellite videos,

    L. Ren, W. Yin, W. Diao, K. Fu, and X. Sun, “Motion-guided multiobject tracking model for high-speed aerial objects in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024

  27. [27]

    Simple online and realtime tracking with a deep association metric,

    N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” inProc. IEEE Int. Conf. Image Process., 2017, pp. 3645–3649

  28. [28]

    Strongsort: Make deepsort great again,

    Y . Du, Z. Zhao, Y . Song, Y . Zhao, F. Su, T. Gong, and H. Meng, “Strongsort: Make deepsort great again,”IEEE Trans. Multimedia, vol. 25, pp. 8725–8737, 2023

  29. [29]

    Observation- centric sort: Rethinking sort for robust multi-object tracking,

    J. Cao, X. Weng, R. Khirodkar, J. Pang, and K. Kitani, “Observation- centric sort: Rethinking sort for robust multi-object tracking,”Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9686–9696, 2022. THIS PAPER IS UNDER REVIEW AT IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. 13

  30. [30]

    Bytetrack: Multi-object tracking by associating every detection box,

    Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” inProc. Eur. Conf. Comput. Vis., 2022

  31. [31]

    Multiple object tracking with gru association and kalman prediction,

    Z. Lit, S. Cai, X. Wang, H. Shao, L. Niu, and N. Xue, “Multiple object tracking with gru association and kalman prediction,” inProc. Int. Joint Conf. Neural Netw., 2021, pp. 1–8

  32. [32]

    Hcgnet: A hierarchical context-guided network for multi-object tracking,

    R. Li, B. Zhang, W. Liu, Z. Li, J. Fan, Z. Teng, and J. Fan, “Hcgnet: A hierarchical context-guided network for multi-object tracking,”Knowl.- Based Syst., vol. 297, p. 111859, 2024

  33. [33]

    Single-shot and multi-shot feature learning for multi-object tracking,

    Y . Li, S. Zhou, Z. Qin, L. Wang, J. Wang, and N. Zheng, “Single-shot and multi-shot feature learning for multi-object tracking,”IEEE Trans. Multimedia, vol. 26, pp. 9515–9526, 2024

  34. [34]

    Multiple object tracking as id prediction,

    R. Gao, J. Qi, and L. Wang, “Multiple object tracking as id prediction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 27 883–27 893

  35. [35]

    Object tracking on satellite videos: A correlation filter-based tracking method with trajectory correction by kalman filter,

    Y . Guo, D. Yang, and Z. Chen, “Object tracking on satellite videos: A correlation filter-based tracking method with trajectory correction by kalman filter,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 9, pp. 3538–3551, 2019

  36. [36]

    Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame- difference algorithm,

    B. Du, Y . Sun, S. Cai, C. Wu, and Q. Du, “Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame- difference algorithm,”IEEE Geosci. Remote Sens. Lett., vol. 15, no. 2, pp. 168–172, 2018

  37. [37]

    Satellite video tracking by multi- feature correlation filters with motion estimation,

    Y . Zhang, D. Chen, and Y . Zheng, “Satellite video tracking by multi- feature correlation filters with motion estimation,”Remote Sensing, vol. 14, no. 11, 2022

  38. [38]

    Aircraft tracking based on an antidrift multifilter tracker in satellite video data,

    R. Pang, F. Gao, P. Zhang, X. Li, and Y . Zhai, “Aircraft tracking based on an antidrift multifilter tracker in satellite video data,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, pp. 4439–4456, 2023

  39. [39]

    Transforming model prediction for tracking,

    C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. Van Gool, “Transforming model prediction for tracking,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8721–8730

  40. [40]

    Learning target candidate association to keep track of what not to track,

    C. Mayer, M. Danelljan, D. Pani Paudel, and L. Van Gool, “Learning target candidate association to keep track of what not to track,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 13 424–13 434

  41. [41]

    Aiatrack: Attention in attention for transformer visual tracking,

    S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 146–164

  42. [42]

    Sfa-guided mosaic transformer for tracking small objects in snapshot spectral imaging,

    L. Chen, Y . Zhao, and S. G. Kong, “Sfa-guided mosaic transformer for tracking small objects in snapshot spectral imaging,”ISPRS J. Photogramm. Remote Sens., vol. 204, pp. 223–236, 2023

  43. [43]

    Edge-guided perceptual network for infrared small target detection,

    Q. Li, M. Zhang, Z. Yang, Y . Yuan, and Q. Wang, “Edge-guided perceptual network for infrared small target detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–10, 2024

  44. [44]

    Mosaic-tracker: Mutual- enhanced occlusion-aware spatiotemporal adaptive identity consistency network for aerial multi-object tracking,

    J. Zou, W. Zhang, Q. Li, and Q. Wang, “Mosaic-tracker: Mutual- enhanced occlusion-aware spatiotemporal adaptive identity consistency network for aerial multi-object tracking,”ISPRS J. Photogramm. Remote Sens., vol. 229, pp. 138–154, 2025

  45. [45]

    Robust object modeling for visual tracking,

    Y . Cai, J. Liu, J. Tang, and G. Wu, “Robust object modeling for visual tracking,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 9555– 9566

  46. [46]

    Seqtrack: Sequence to sequence learning for visual object tracking,

    X. Chen, H. Peng, D. Wang, H. Lu, and H. Hu, “Seqtrack: Sequence to sequence learning for visual object tracking,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 14 572–14 581

  47. [47]

    Foreground-background distribution modeling transformer for visual object tracking,

    D. Yang, J. He, Y . Ma, Q. Yu, and T. Zhang, “Foreground-background distribution modeling transformer for visual object tracking,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10 083–10 093

  48. [48]

    Learning target-aware vision transformers for real-time uav tracking,

    S. Li, X. Yang, X. Wang, D. Zeng, H. Ye, and Q. Zhao, “Learning target-aware vision transformers for real-time uav tracking,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–18, 2024

  49. [49]

    Tlsh-mot: Drone-view video multiple object tracking via transformer-based locally sensitive hash,

    Y . Yuan, Y . Wu, L. Zhao, Y . Liu, and Y . Pang, “Tlsh-mot: Drone-view video multiple object tracking via transformer-based locally sensitive hash,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–16, 2025

  50. [50]

    Mctracker: Satellite video multi- object tracking considering inter-frame motion correlation and multi- scale cascaded feature enhancement,

    B. Wang, H. Sui, G. Ma, and Y . Zhou, “Mctracker: Satellite video multi- object tracking considering inter-frame motion correlation and multi- scale cascaded feature enhancement,”ISPRS J. Photogramm. Remote Sens., vol. 214, pp. 82–103, 2024

  51. [51]

    Piftrack: Point-of-interest flows for multiobject tracking in satellite videos,

    H. Chen, W. Zhao, X. Fan, X. Shang, R. Zhang, N. Li, and D. Li, “Piftrack: Point-of-interest flows for multiobject tracking in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–14, 2025

  52. [52]

    Vehicle tracking on satellite video based on historical model,

    S. Chen, T. Wang, H. Wang, Y . Wang, J. Hong, T. Dong, and Z. Li, “Vehicle tracking on satellite video based on historical model,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 7784–7796, 2022

  53. [53]

    On-satellite implementation of real-time multi- object moving vehicle tracking with complex moving backgrounds,

    J. Yu, S. Wei, Y . Wen, D. Zhou, R. Dou, X. Wang, J. Xu, J. Liu, N. Wu, and L. Liu, “On-satellite implementation of real-time multi- object moving vehicle tracking with complex moving backgrounds,” Remote Sensing, vol. 17, no. 3, 2025

  54. [54]

    Luojia 3-01 satellite—real-time intelligent service system for remote sensing science experiment satellite,

    M. Wang, Q. Wu, J. Xiao, D. Li, and F. Yang, “Luojia 3-01 satellite—real-time intelligent service system for remote sensing science experiment satellite,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 8250–8257, 2024

  55. [55]

    Deep layer aggregation,

    F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2403–2412

  56. [56]

    Delving deeper into convolutional networks for learning video representations,

    N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,”Computer Science, 2015

  57. [57]

    Efficientdet: Scalable and efficient object detection,

    M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 781–10 790

  58. [58]

    Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,

    X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” inProc. Adv. Neural Inf. Process. Syst., 2020

  59. [59]

    Adaptive smooth l1 loss: A better way to regress scene texts with extreme aspect ratios,

    C. Liu, S. Yu, M. Yu, B. Wei, B. Li, G. Li, and W. Huang, “Adaptive smooth l1 loss: A better way to regress scene texts with extreme aspect ratios,” inProc. IEEE Symp. Comput. Commun., 2021, pp. 1–7

  60. [60]

    Cross- frame keypoint-based and spatial motion information-guided networks for moving vehicle detection and tracking in satellite videos,

    J. Feng, D. Zeng, X. Jia, X. Zhang, J. Li, Y . Liang, and L. Jiao, “Cross- frame keypoint-based and spatial motion information-guided networks for moving vehicle detection and tracking in satellite videos,”ISPRS J. Photogramm. Remote Sens., vol. 177, pp. 116–130, 2021

  61. [61]

    Tracking objects as points,

    X. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Tracking objects as points,” in Proc. Eur. Conf. Comput. Vis., 2020, p. 474–490

  62. [62]

    Fairmot: On the fairness of detection and re-identification in multiple object tracking,

    Y . Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” Int. J. Comput. Vision, vol. 129, no. 11, p. 3069–3087, Nov. 2021

  63. [63]

    An adaptive image registration method based on sift features and ransac transform,

    Z. Hossein-Nejad and M. Nasri, “An adaptive image registration method based on sift features and ransac transform,”Comput. Electr. Eng., vol. 62, pp. 524–537, 2017