arxiv: 2601.09240 · v2 · submitted 2026-01-14 · 💻 cs.CV · eess.IV

Recognition: 1 theorem link

· Lean Theorem

DeTracker: Motion-decoupled Vehicle Detection and Tracking in Unstabilized Satellite Videos

Jiajun Chen , Jing Xiao , Shaohan Cao , Yuming Zhu , Liang Liao , Jun Pan , Mi Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 14:40 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords vehicle trackingsatellite videomotion decouplingmulti-object trackingunstabilized videotiny object detectiontemporal feature fusion

0 comments

The pith

DeTracker decouples dominant platform motion from weak target motion to track tiny vehicles in jittery satellite videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops DeTracker as a joint detection and tracking system for satellite videos where platform jitter overwhelms the faint signals of small moving vehicles. Its Global-Local Motion Decoupling module aligns features globally to cancel background motion while refining local patches to preserve target trajectories. A Temporal Dependency Feature Pyramid fuses information across frames to strengthen the representation of tiny objects whose appearance alone is unreliable. The authors also release SDM-Car-SU, a benchmark that injects controlled multi-directional and multi-speed platform motions into simulated scenes. On both this dataset and real unstabilized satellite sequences the method records higher MOTA scores than prior trackers.

Core claim

DeTracker achieves motion decoupling in unstabilized satellite videos by using global semantic alignment at the feature level to suppress dominant platform motion and local refinement to capture target-specific motion, together with cross-frame temporal fusion that improves the continuity of weak vehicle signals, resulting in more stable trajectories and consistent identities.

What carries the argument

The Global-Local Motion Decoupling (GLMD) module, which suppresses background-dominated motion via global semantic alignment and captures target motion through local refinement.

If this is right

Trajectory stability increases when global alignment removes the bulk of platform-induced displacement before local tracking begins.
Identity consistency improves for vehicles whose appearance is too weak to rely on alone.
Temporal feature fusion across frames raises the discriminability of tiny objects under varying motion speeds.
The SDM-Car-SU benchmark allows direct measurement of robustness across different jitter directions and velocities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same global-local split could be tested on drone footage or handheld camera sequences where camera shake dominates small-object motion.
If feature-level alignment proves sufficient, future work might replace explicit stabilization preprocessing with this learned decoupling step.
The approach suggests that any tracking task with a strong background motion bias may benefit from explicit semantic suppression before local association.

Load-bearing premise

The controlled platform motions added to the SDM-Car-SU dataset match the statistical structure of real satellite jitter without discarding useful target signals.

What would settle it

Real unstabilized satellite sequences in which DeTracker produces more identity switches or trajectory breaks than a baseline that does not separate global and local motion.

Figures

Figures reproduced from arXiv: 2601.09240 by Jiajun Chen, Jing Xiao, Jun Pan, Liang Liao, Mi Wang, Shaohan Cao, Yuming Zhu.

**Figure 1.** Figure 1: Major challenges of object tracking in unstabilized satellite videos. (a) Motion decoupling ambiguity: platform-induced jitter introduces complex [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Visualization of inter-frame motion and annotation mapping under unstabilized conditions. The left columns present the frame-wise annotation mapping [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed DeTracker. The framework consists of three components: (1) a feature extraction backbone for acquiring multi-scale spatial [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The structure of the GLMD module comprises a global alignment [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of detection results for vehicles moving in different directions. From the first line to the third line are frames 50, 120, and 189 respectively [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of tracking trajectories, where the motion trajectory of each object is shown in a distinct color within the same frame. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the GLMD effect. Before alignment, the feature difference map between two consecutive frames contains many highlighted regions, [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Feature map visualization results after incorporating different modules. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Satellite videos provide continuous observations of surface dynamics but pose significant challenges for multi-object tracking (MOT), especially under unstabilized conditions where platform jitter and the weak appearance of tiny objects jointly degrade tracking performance. To address this problem, we propose DeTracker, a joint-detection-and-tracking framework tailored for unstabilized satellite videos. DeTracker introduces a task-driven Global-Local Motion Decoupling (GLMD) module to address the motion imbalance between dominant platform motion and weak target motion. It suppresses background-dominated motion via global semantic alignment at the feature level and captures target-specific motion through local refinement, improving trajectory stability and identity consistency. In addition, a Temporal Dependency Feature Pyramid (TDFP) module is developed to perform cross-frame temporal feature fusion, enhancing the continuity and discriminability of tiny-object representations. We further construct a new benchmark dataset, SDM-Car-SU, which simulates multi-directional and multi-speed platform motions to enable systematic evaluation of tracking robustness under varying motion perturbations. Extensive experiments on both simulated and real unstabilized satellite videos demonstrate that DeTracker significantly outperforms existing methods, achieving 61.1% MOTA on SDM-Car-SU and 45.3% MOTA on real satellite video data. The code and dataset will be publicly available at https://github.com/alex-chenjiajun/DeTracker.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeTracker adds GLMD motion decoupling and TDFP temporal fusion plus a new SDM-Car-SU dataset for unstabilized satellite tracking, with reported gains that rest on unverified simulation fidelity.

read the letter

The main points are a joint detection-tracking model with a Global-Local Motion Decoupling module that aligns global features to suppress platform jitter while refining local target motion, plus a Temporal Dependency Feature Pyramid for cross-frame fusion on tiny objects. They also introduce the SDM-Car-SU benchmark that varies platform motion direction and speed in simulation, and they report 61.1% MOTA on that set and 45.3% on real unstabilized satellite video, beating prior methods. Code and data are promised to be released. This is a focused engineering response to a real remote-sensing pain point where background motion swamps weak targets. The decoupling strategy and temporal pyramid are reasonable domain adaptations, and creating a controlled benchmark for motion perturbations is a useful step that lets others test robustness systematically. The numbers suggest the pieces work together in practice on the data they tested. The clearest soft spot is the missing check that simulated jitter statistics match real satellite sequences. No direct comparison of amplitude, frequency content, or flow histograms is described, so the GLMD gains could be tuned to the simulation distribution rather than generalizing cleanly. The abstract also skips baseline details, ablation breakdowns, and error bars, which leaves the size of the actual contribution unclear until the full experiments are examined. This paper is for people doing multi-object tracking on aerial or satellite video, especially anyone dealing with platform instability or small low-contrast targets. A reader building or evaluating MOT pipelines for remote sensing would pick up concrete module ideas and a new test set. It has enough targeted novelty and empirical grounding to merit peer review, though the simulation validation and experimental transparency would need tightening in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript presents DeTracker, a joint detection-and-tracking framework for vehicles in unstabilized satellite videos. It introduces a Global-Local Motion Decoupling (GLMD) module that performs global semantic alignment to suppress dominant platform motion while refining local target motion, a Temporal Dependency Feature Pyramid (TDFP) for cross-frame temporal feature fusion, and a new simulated benchmark SDM-Car-SU that injects multi-directional and multi-speed platform motions. Experiments report 61.1% MOTA on SDM-Car-SU and 45.3% MOTA on real unstabilized sequences, with claims of significant outperformance over existing methods.

Significance. If the central claims hold, the work provides a practical engineering advance for MOT under platform jitter and weak target appearance, a setting relevant to satellite-based monitoring. The explicit commitment to release code and the SDM-Car-SU dataset is a clear strength that supports reproducibility and future benchmarking.

major comments (2)

[Dataset Construction] Dataset section: the assertion that SDM-Car-SU faithfully reproduces real unstabilized satellite jitter is load-bearing for both the GLMD module's reported gains and the generalization claim to real data, yet no quantitative matching (e.g., amplitude histograms, frequency spectra, or spatial correlation statistics of optical-flow vectors) between simulated and real sequences is provided.
[Experiments] Experiments section: the headline MOTA figures (61.1% on SDM-Car-SU, 45.3% on real data) are presented without reported ablations isolating GLMD versus TDFP, without baseline implementation details or hyper-parameter settings, and without error bars or statistical significance tests, leaving the robustness of the outperformance claim difficult to evaluate.

minor comments (2)

[Method] Notation in the GLMD description could be clarified by explicitly defining the global alignment loss and the local refinement operator before their first use.
[Qualitative Results] Figure captions for the qualitative results should state the exact frame indices and motion parameters used so readers can replicate the visualized conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Dataset Construction] Dataset section: the assertion that SDM-Car-SU faithfully reproduces real unstabilized satellite jitter is load-bearing for both the GLMD module's reported gains and the generalization claim to real data, yet no quantitative matching (e.g., amplitude histograms, frequency spectra, or spatial correlation statistics of optical-flow vectors) between simulated and real sequences is provided.

Authors: We agree that quantitative validation would strengthen the claims regarding simulation fidelity. In the revised manuscript, we will add direct comparisons between SDM-Car-SU and real sequences, including amplitude histograms of platform motion, frequency spectra of jitter, and spatial correlation statistics of optical-flow vectors. This will provide explicit evidence supporting the simulation's realism. revision: yes
Referee: [Experiments] Experiments section: the headline MOTA figures (61.1% on SDM-Car-SU, 45.3% on real data) are presented without reported ablations isolating GLMD versus TDFP, without baseline implementation details or hyper-parameter settings, and without error bars or statistical significance tests, leaving the robustness of the outperformance claim difficult to evaluate.

Authors: We acknowledge the value of additional experimental details for assessing robustness. We will expand the experiments section to include ablations isolating the contributions of GLMD and TDFP, full baseline implementation details with hyper-parameter settings, and results reported with error bars from multiple runs together with statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with held-out validation

full rationale

The paper proposes two new modules (GLMD for global-local motion decoupling via semantic alignment and local refinement, TDFP for cross-frame temporal fusion) and a new simulated dataset SDM-Car-SU to evaluate tracking under multi-directional platform jitter. Performance is reported as empirical MOTA on held-out simulated data (61.1%) and separate real satellite sequences (45.3%). No derivation, equation, or claim reduces by construction to a fitted parameter, self-definition, or self-citation chain; the central results are measured outcomes against external test sets rather than tautological outputs of the same inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard deep learning assumptions for feature extraction and motion modeling without introducing new physical entities or free parameters beyond typical network hyperparameters.

axioms (1)

domain assumption Feature-level alignment and pyramid fusion can reliably separate dominant platform motion from weak target motion in satellite imagery.
Invoked in the description of the GLMD and TDFP modules as the basis for improved trajectory stability.

pith-pipeline@v0.9.0 · 5552 in / 1189 out tokens · 55939 ms · 2026-05-16T14:40:57.843448+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeTracker introduces a task-driven Global-Local Motion Decoupling (GLMD) module... Temporal Dependency Feature Pyramid (TDFP) module

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages

[1]

Big data analytics in intelligent transportation systems: A survey,

L. Zhu, F. R. Yu, Y . Wang, B. Ning, and T. Tang, “Big data analytics in intelligent transportation systems: A survey,”IEEE Trans. Intell. Transp. Syst., vol. 20, no. 1, pp. 383–398, 2019

work page 2019
[2]

Enhancing emergency response with real-time video analytics for natural disaster management,

D. H. Sai, J. A, T. Thiyagu, V . K, C. J. R, and S. Kirubakaran S, “Enhancing emergency response with real-time video analytics for natural disaster management,” inProc. 4th Int. Conf. Smart Technol., Commun. Robot., 2025, pp. 1–7

work page 2025
[3]

Small moving vehicle detection in a satellite video of an urban area,

T. Yang, X. Wang, B. Yao, J. Li, Y . Zhang, Z. He, and W. Duan, “Small moving vehicle detection in a satellite video of an urban area,”Sensors, vol. 16, no. 9, p. 1528, 2016

work page 2016
[4]

High-speed tracking with kernelized correlation filters,

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, 2015

work page 2015
[5]

Accurate scale estimation for robust visual tracking,

M. Danelljan, G. H ¨ager, F. Khan, and M. Felsberg, “Accurate scale estimation for robust visual tracking,” inProc. Brit. Mach. Vis. Conf., 2014

work page 2014
[6]

Dsfnet: Dynamic and static fusion network for moving object detection in satellite videos,

C. Xiao, Q. Yin, X. Ying, R. Li, S. Wu, M. Li, L. Liu, W. An, and Z. Chen, “Dsfnet: Dynamic and static fusion network for moving object detection in satellite videos,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

work page 2022
[7]

Hrtracker: Multi-object tracking in satellite video enhanced by high-resolution feature fusion and an adaptive data association,

Y . Wu, Q. Liu, H. Sun, and D. Xue, “Hrtracker: Multi-object tracking in satellite video enhanced by high-resolution feature fusion and an adaptive data association,”Remote Sensing, vol. 16, no. 17, 2024

work page 2024
[8]

Multi-object tracking in satellite videos with graph-based multitask modeling,

Q. He, X. Sun, Z. Yan, B. Li, and K. Fu, “Multi-object tracking in satellite videos with graph-based multitask modeling,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–13, 2022

work page 2022
[9]

Mp2net: Mask propagation and motion prediction network for multiobject tracking in satellite videos,

M. Zhao, S. Li, H. Wang, J. Yang, Y . Sun, and Y . Gu, “Mp2net: Mask propagation and motion prediction network for multiobject tracking in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024

work page 2024
[10]

Cftracker: Multi-object tracking with cross-frame connections in satellite videos,

L. Kong, Z. Yan, Y . Zhang, W. Diao, Z. Zhu, and L. Wang, “Cftracker: Multi-object tracking with cross-frame connections in satellite videos,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023

work page 2023
[11]

Multiple object tracking in satellite video with graph-based multiclue fusion tracker,

H. Chen, N. Li, D. Li, J. Lv, W. Zhao, R. Zhang, and J. Xu, “Multiple object tracking in satellite video with graph-based multiclue fusion tracker,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–14, 2024

work page 2024
[12]

Tabctnet: Target-aware bilateral cnn- transformer network for single object tracking in satellite videos,

Q. Zhu, X. Huang, and Q. Guan, “Tabctnet: Target-aware bilateral cnn- transformer network for single object tracking in satellite videos,”Int. J. Appl. Earth Obs. Geoinf., vol. 128, p. 103723, 2024

work page 2024
[13]

Target-aware transformer tracking,

Y . Zheng, Y . Zhang, and B. Xiao, “Target-aware transformer tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 4542–4551, 2023

work page 2023
[14]

A twofold siamese network for real-time object tracking,

A. He, C. Luo, X. Tian, and W. Zeng, “A twofold siamese network for real-time object tracking,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4834–4843

work page 2018
[15]

Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking,

H. Zhou, W. Ouyang, J. Cheng, X. Wang, and H. Li, “Deep continuous conditional random fields with asymmetric inter-object constraints for online multi-object tracking,”IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1011–1022, 2019

work page 2019
[16]

One-shot multiple object tracking with robust id preservation,

W. Lv, N. Zhang, J. Zhang, and D. Zeng, “One-shot multiple object tracking with robust id preservation,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 6, pp. 4473–4488, 2024

work page 2024
[17]

Fstrack: One-shot multi-object tracking algorithm based on feature enhancement and similarity estimation,

B. He, L. Yuan, and K. Lv, “Fstrack: One-shot multi-object tracking algorithm based on feature enhancement and similarity estimation,” IEEE Signal Process. Lett., vol. 31, pp. 775–779, 2024

work page 2024
[18]

Satellite video stabilization method based on global motion consistency,

M. Li, D. Fan, and G. Chu, “Satellite video stabilization method based on global motion consistency,” in2022 14th International Conference on Signal Processing Systems (ICSPS), 2022, pp. 787–792

work page 2022
[19]

High-precision satellite video stabilization method based on ed-ransac operator,

F. Zhang, X. Li, T. Wang, G. Zhang, J. Hong, Q. Cheng, and T. Dong, “High-precision satellite video stabilization method based on ed-ransac operator,”Remote Sensing, vol. 15, no. 12, 2023

work page 2023
[20]

Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,

Z. Zhiqi, W. Mi, C. Jinshan, L. Chuang, and L. Dunbo, “Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,”Geomatics Inf. Sci. Wuhan Univ., vol. 49, no. 6, pp. 899–910, 2024

work page 2024
[21]

Error bounded foreground and background modeling for moving object detection in satellite videos,

J. Zhang, X. Jia, and J. Hu, “Error bounded foreground and background modeling for moving object detection in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 58, no. 4, pp. 2659–2669, 2020

work page 2020
[22]

A multitask benchmark dataset for satellite video: Object detection, tracking, and segmentation,

S. Li, Z. Zhou, M. Zhao, J. Yang, W. Guo, Y . Lv, L. Kou, H. Wang, and Y . Gu, “A multitask benchmark dataset for satellite video: Object detection, tracking, and segmentation,”IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–21, 2023

work page 2023
[23]

Detecting and tracking small and dense moving objects in satellite videos: A benchmark,

Q. Yin, Q. Hu, H. Liu, F. Zhang, Y . Wang, Z. Lin, W. An, and Y . Guo, “Detecting and tracking small and dense moving objects in satellite videos: A benchmark,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2022

work page 2022
[24]

Sdm-car: A dataset for small and dim moving vehicles detection in satellite videos,

Z. Zhang, T. Peng, L. Liao, J. Xiao, and M. Wang, “Sdm-car: A dataset for small and dim moving vehicles detection in satellite videos,”IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024

work page 2024
[25]

Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,

Z. Zhang, M. Wang, J. Cao, C. Liu, and D. Liao, “Object-space- consistency-based real-time stabilization approach for luojia3-01 video data,”Geomatics Inf. Sci. Wuhan Univ., vol. 49, no. 6, pp. 899–910, 2024

work page 2024
[26]

Motion-guided multiobject tracking model for high-speed aerial objects in satellite videos,

L. Ren, W. Yin, W. Diao, K. Fu, and X. Sun, “Motion-guided multiobject tracking model for high-speed aerial objects in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024

work page 2024
[27]

Simple online and realtime tracking with a deep association metric,

N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” inProc. IEEE Int. Conf. Image Process., 2017, pp. 3645–3649

work page 2017
[28]

Strongsort: Make deepsort great again,

Y . Du, Z. Zhao, Y . Song, Y . Zhao, F. Su, T. Gong, and H. Meng, “Strongsort: Make deepsort great again,”IEEE Trans. Multimedia, vol. 25, pp. 8725–8737, 2023

work page 2023
[29]

Observation- centric sort: Rethinking sort for robust multi-object tracking,

J. Cao, X. Weng, R. Khirodkar, J. Pang, and K. Kitani, “Observation- centric sort: Rethinking sort for robust multi-object tracking,”Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9686–9696, 2022. THIS PAPER IS UNDER REVIEW AT IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. 13

work page 2022
[30]

Bytetrack: Multi-object tracking by associating every detection box,

Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” inProc. Eur. Conf. Comput. Vis., 2022

work page 2022
[31]

Multiple object tracking with gru association and kalman prediction,

Z. Lit, S. Cai, X. Wang, H. Shao, L. Niu, and N. Xue, “Multiple object tracking with gru association and kalman prediction,” inProc. Int. Joint Conf. Neural Netw., 2021, pp. 1–8

work page 2021
[32]

Hcgnet: A hierarchical context-guided network for multi-object tracking,

R. Li, B. Zhang, W. Liu, Z. Li, J. Fan, Z. Teng, and J. Fan, “Hcgnet: A hierarchical context-guided network for multi-object tracking,”Knowl.- Based Syst., vol. 297, p. 111859, 2024

work page 2024
[33]

Single-shot and multi-shot feature learning for multi-object tracking,

Y . Li, S. Zhou, Z. Qin, L. Wang, J. Wang, and N. Zheng, “Single-shot and multi-shot feature learning for multi-object tracking,”IEEE Trans. Multimedia, vol. 26, pp. 9515–9526, 2024

work page 2024
[34]

Multiple object tracking as id prediction,

R. Gao, J. Qi, and L. Wang, “Multiple object tracking as id prediction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 27 883–27 893

work page 2025
[35]

Object tracking on satellite videos: A correlation filter-based tracking method with trajectory correction by kalman filter,

Y . Guo, D. Yang, and Z. Chen, “Object tracking on satellite videos: A correlation filter-based tracking method with trajectory correction by kalman filter,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 9, pp. 3538–3551, 2019

work page 2019
[36]

Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame- difference algorithm,

B. Du, Y . Sun, S. Cai, C. Wu, and Q. Du, “Object tracking in satellite videos by fusing the kernel correlation filter and the three-frame- difference algorithm,”IEEE Geosci. Remote Sens. Lett., vol. 15, no. 2, pp. 168–172, 2018

work page 2018
[37]

Satellite video tracking by multi- feature correlation filters with motion estimation,

Y . Zhang, D. Chen, and Y . Zheng, “Satellite video tracking by multi- feature correlation filters with motion estimation,”Remote Sensing, vol. 14, no. 11, 2022

work page 2022
[38]

Aircraft tracking based on an antidrift multifilter tracker in satellite video data,

R. Pang, F. Gao, P. Zhang, X. Li, and Y . Zhai, “Aircraft tracking based on an antidrift multifilter tracker in satellite video data,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16, pp. 4439–4456, 2023

work page 2023
[39]

Transforming model prediction for tracking,

C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. Van Gool, “Transforming model prediction for tracking,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 8721–8730

work page 2022
[40]

Learning target candidate association to keep track of what not to track,

C. Mayer, M. Danelljan, D. Pani Paudel, and L. Van Gool, “Learning target candidate association to keep track of what not to track,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 13 424–13 434

work page 2021
[41]

Aiatrack: Attention in attention for transformer visual tracking,

S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 146–164

work page 2022
[42]

Sfa-guided mosaic transformer for tracking small objects in snapshot spectral imaging,

L. Chen, Y . Zhao, and S. G. Kong, “Sfa-guided mosaic transformer for tracking small objects in snapshot spectral imaging,”ISPRS J. Photogramm. Remote Sens., vol. 204, pp. 223–236, 2023

work page 2023
[43]

Edge-guided perceptual network for infrared small target detection,

Q. Li, M. Zhang, Z. Yang, Y . Yuan, and Q. Wang, “Edge-guided perceptual network for infrared small target detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–10, 2024

work page 2024
[44]

Mosaic-tracker: Mutual- enhanced occlusion-aware spatiotemporal adaptive identity consistency network for aerial multi-object tracking,

J. Zou, W. Zhang, Q. Li, and Q. Wang, “Mosaic-tracker: Mutual- enhanced occlusion-aware spatiotemporal adaptive identity consistency network for aerial multi-object tracking,”ISPRS J. Photogramm. Remote Sens., vol. 229, pp. 138–154, 2025

work page 2025
[45]

Robust object modeling for visual tracking,

Y . Cai, J. Liu, J. Tang, and G. Wu, “Robust object modeling for visual tracking,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 9555– 9566

work page 2023
[46]

Seqtrack: Sequence to sequence learning for visual object tracking,

X. Chen, H. Peng, D. Wang, H. Lu, and H. Hu, “Seqtrack: Sequence to sequence learning for visual object tracking,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 14 572–14 581

work page 2023
[47]

Foreground-background distribution modeling transformer for visual object tracking,

D. Yang, J. He, Y . Ma, Q. Yu, and T. Zhang, “Foreground-background distribution modeling transformer for visual object tracking,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10 083–10 093

work page 2023
[48]

Learning target-aware vision transformers for real-time uav tracking,

S. Li, X. Yang, X. Wang, D. Zeng, H. Ye, and Q. Zhao, “Learning target-aware vision transformers for real-time uav tracking,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–18, 2024

work page 2024
[49]

Tlsh-mot: Drone-view video multiple object tracking via transformer-based locally sensitive hash,

Y . Yuan, Y . Wu, L. Zhao, Y . Liu, and Y . Pang, “Tlsh-mot: Drone-view video multiple object tracking via transformer-based locally sensitive hash,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–16, 2025

work page 2025
[50]

Mctracker: Satellite video multi- object tracking considering inter-frame motion correlation and multi- scale cascaded feature enhancement,

B. Wang, H. Sui, G. Ma, and Y . Zhou, “Mctracker: Satellite video multi- object tracking considering inter-frame motion correlation and multi- scale cascaded feature enhancement,”ISPRS J. Photogramm. Remote Sens., vol. 214, pp. 82–103, 2024

work page 2024
[51]

Piftrack: Point-of-interest flows for multiobject tracking in satellite videos,

H. Chen, W. Zhao, X. Fan, X. Shang, R. Zhang, N. Li, and D. Li, “Piftrack: Point-of-interest flows for multiobject tracking in satellite videos,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–14, 2025

work page 2025
[52]

Vehicle tracking on satellite video based on historical model,

S. Chen, T. Wang, H. Wang, Y . Wang, J. Hong, T. Dong, and Z. Li, “Vehicle tracking on satellite video based on historical model,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 7784–7796, 2022

work page 2022
[53]

On-satellite implementation of real-time multi- object moving vehicle tracking with complex moving backgrounds,

J. Yu, S. Wei, Y . Wen, D. Zhou, R. Dou, X. Wang, J. Xu, J. Liu, N. Wu, and L. Liu, “On-satellite implementation of real-time multi- object moving vehicle tracking with complex moving backgrounds,” Remote Sensing, vol. 17, no. 3, 2025

work page 2025
[54]

Luojia 3-01 satellite—real-time intelligent service system for remote sensing science experiment satellite,

M. Wang, Q. Wu, J. Xiao, D. Li, and F. Yang, “Luojia 3-01 satellite—real-time intelligent service system for remote sensing science experiment satellite,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 8250–8257, 2024

work page 2024
[55]

Deep layer aggregation,

F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2403–2412

work page 2018
[56]

Delving deeper into convolutional networks for learning video representations,

N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,”Computer Science, 2015

work page 2015
[57]

Efficientdet: Scalable and efficient object detection,

M. Tan, R. Pang, and Q. V . Le, “Efficientdet: Scalable and efficient object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 781–10 790

work page 2020
[58]

Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,

X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” inProc. Adv. Neural Inf. Process. Syst., 2020

work page 2020
[59]

Adaptive smooth l1 loss: A better way to regress scene texts with extreme aspect ratios,

C. Liu, S. Yu, M. Yu, B. Wei, B. Li, G. Li, and W. Huang, “Adaptive smooth l1 loss: A better way to regress scene texts with extreme aspect ratios,” inProc. IEEE Symp. Comput. Commun., 2021, pp. 1–7

work page 2021
[60]

Cross- frame keypoint-based and spatial motion information-guided networks for moving vehicle detection and tracking in satellite videos,

J. Feng, D. Zeng, X. Jia, X. Zhang, J. Li, Y . Liang, and L. Jiao, “Cross- frame keypoint-based and spatial motion information-guided networks for moving vehicle detection and tracking in satellite videos,”ISPRS J. Photogramm. Remote Sens., vol. 177, pp. 116–130, 2021

work page 2021
[61]

Tracking objects as points,

X. Zhou, V . Koltun, and P. Kr ¨ahenb¨uhl, “Tracking objects as points,” in Proc. Eur. Conf. Comput. Vis., 2020, p. 474–490

work page 2020
[62]

Fairmot: On the fairness of detection and re-identification in multiple object tracking,

Y . Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” Int. J. Comput. Vision, vol. 129, no. 11, p. 3069–3087, Nov. 2021

work page 2021
[63]

An adaptive image registration method based on sift features and ransac transform,

Z. Hossein-Nejad and M. Nasri, “An adaptive image registration method based on sift features and ransac transform,”Comput. Electr. Eng., vol. 62, pp. 524–537, 2017

work page 2017