SCT-MOT: Enhancing Air-to-Air Multiple UAVs Tracking with Swarm-Coupled Motion and Trajectory Guidance
Pith reviewed 2026-05-10 18:59 UTC · model grok-4.3
The pith
SCT-MOT tracks multiple UAVs in swarms more accurately by modeling their coupled motions and guiding visual features with predicted trajectories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a Swarm Motion-Aware Trajectory Prediction module, which processes the swarm's historical trajectories and posture-aware appearance features together, forecasts nonlinear group trajectories more accurately, and that integrating these forecasts via a Trajectory-Guided Spatio-Temporal Feature Fusion module with current frame features strengthens temporal consistency for weak objects, leading to overall better tracking performance.
What carries the argument
Swarm Motion-Aware Trajectory Prediction (SMTP) that jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective to forecast coupled group motions.
Load-bearing premise
That treating the UAVs as a coupled swarm system rather than independent objects will produce better motion forecasts and tracking consistency.
What would settle it
An experiment showing that trajectory prediction accuracy does not improve when using swarm-level modeling compared to per-object modeling on the AIRMOT dataset would disprove the core benefit of the SMTP module.
Figures
read the original abstract
Air-to-air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm-level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio-temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT-MOT, a tracking framework that integrates Swarm-Coupled motion modeling and Trajectory-guided feature fusion. First, we develop a Swarm Motion-Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air-to-air swarm UAV tracking datasets, including AIRMOT, MOT-FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21\% IDF1 improvement over the state-of-the-art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT-MOT consistently achieves superior accuracy and robustness compared to state-of-the-art trackers across multiple metrics under complex swarm scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SCT-MOT, a tracking framework for air-to-air multiple UAV swarms that integrates two new modules: Swarm Motion-Aware Trajectory Prediction (SMTP), which jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective to forecast nonlinear coupled group motions, and Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF), which aligns predicted positions with historical visual cues to improve temporal consistency and spatial discriminability for weak objects. Experiments on AIRMOT, MOT-FLY, and UAVSwarm datasets report consistent superiority over state-of-the-art trackers across multiple metrics, with SMTP yielding a 1.21% IDF1 improvement over EqMotion when substituted into the same MOT pipeline.
Significance. If the reported gains are reproducible, the work offers a meaningful advance in multi-object tracking for UAV swarms by explicitly incorporating swarm-level motion coupling and tighter motion-appearance integration, addressing key failure modes (trajectory fragmentation, identity switches) in visually ambiguous aerial scenarios. The provision of module ablations and a controlled replacement of EqMotion strengthens the evidential basis for the central claims.
minor comments (3)
- [Abstract] Abstract: the claim of 'superior accuracy and robustness ... across multiple metrics' would be strengthened by naming the specific metrics (e.g., MOTA, IDF1, HOTA) and the magnitude of gains on each dataset rather than relying on the single 1.21% IDF1 figure.
- [§3.2] The description of TG-STFF states that it 'aligns predicted positions with historical visual cues and deeply integrates them'; a short schematic or pseudocode in §3.2 would clarify the exact alignment operation and fusion depth.
- [Experimental results tables] Tables reporting results on AIRMOT, MOT-FLY, and UAVSwarm should include standard deviations or confidence intervals for the key metrics to allow assessment of whether the observed deltas exceed experimental variability.
Simulated Author's Rebuttal
We thank the referee for the positive summary of our work and the recommendation for minor revision. The recognition that SCT-MOT addresses key challenges in air-to-air swarm UAV tracking through swarm-level motion coupling and trajectory-guided feature fusion is appreciated, as is the note on the evidential support from ablations and controlled comparisons.
Circularity Check
No significant circularity identified
full rationale
The paper proposes two new modules (SMTP for swarm-coupled trajectory prediction and TG-STFF for trajectory-guided feature fusion) integrated into an MOT framework. The central claims rest on empirical results from ablations and comparisons against baselines like EqMotion on three datasets, with reported gains such as 1.21% IDF1 improvement. No equations, derivations, or self-referential definitions are present that reduce the claimed performance improvements to fitted parameters or prior self-citations by construction. The method is described as an integration of novel components without load-bearing steps that collapse to inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SMTP module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective... temporal-posture attention... global-local spatial-posture attention... temporal residual module with dilated causal convolutions
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TG-STFF... multi-head cross-attention... Gaussian kernel centered at predicted locations... fuses predictive feature maps with current frame features
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Y . Liang, Q. Dong, and Y . Zhao, “Adaptive leader–follower formation control for swarms of unmanned aerial vehicles with motion constraints and unknown disturbances,”Chin. J. Aeronaut., vol. 33, no. 11, pp. 2972–2988, 2020
work page 2020
-
[2]
State-of-the-art and future research challenges in uav swarms,
S. Javed, A. Hassan, R. Ahmad, W. Ahmed, R. Ahmed, A. Saadat, and M. Guizani, “State-of-the-art and future research challenges in uav swarms,”IEEE Internet of Things Journal, vol. 11, no. 11, pp. 19 023–19 045, 2024
work page 2024
-
[3]
Efficient and secured swarm pattern multi-uav communication,
G. Raja, S. Anbalagan, A. Ganapathisubramaniyan, M. S. Sel- vakumar, A. K. Bashir, and S. Mumtaz, “Efficient and secured swarm pattern multi-uav communication,”IEEE Transactions on Vehicular Technology, vol. 70, no. 7, pp. 7050–7058, 2021
work page 2021
-
[4]
L. Wen, Z. Zhen, C. Tao, and J. Ding, “Distributed cooperative strategy of uav swarm without speed measurement under saturation attack mission,”IEEE Transactions on Aerospace and Electronic Systems, vol. 60, no. 4, pp. 4518–4529, 2024
work page 2024
-
[5]
Toward swarm coor- dination: Topology-aware inter-uav routing optimization,
L. Hong, H. Guo, J. Liu, and Y . Zhang, “Toward swarm coor- dination: Topology-aware inter-uav routing optimization,”IEEE Transactions on Vehicular Technology, vol. 69, no. 9, pp. 10 177– 10 187, 2020
work page 2020
-
[6]
Vision-based anti- uav detection and tracking,
J. Zhao, J. Zhang, D. Li, and D. Wang, “Vision-based anti- uav detection and tracking,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 25 323–25 334, 2022
work page 2022
-
[7]
Vision- based swarm tracking of multiple uavs in air-to-air scenarios,
Z. Chu, T. Song, R. Jin, D. Lin, H. Shen, and M. Lyu, “Vision- based swarm tracking of multiple uavs in air-to-air scenarios,” Chinese Journal of Aeronautics, p. 103558, 2025
work page 2025
-
[8]
Multiple object tracking: A literature review,
W.-H. Luo, J.-L. Xing, A. Milan, X.-Q. Zhang, W. Liu, and T.-K. Kim, “Multiple object tracking: A literature review,”Artif. Intell., vol. 293, p. 103448, 2021
work page 2021
-
[9]
Simple online and realtime tracking with a deep association metric,
N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” inICIP 2017: Proceed- ings of the IEEE international conference on image processing. IEEE, 2017, pp. 3645–3649
work page 2017
-
[10]
Y .-H. Du, J.-F. Wan, Y .-Y . Zhao, B.-Y . Zhang, Z.-H. Tong, and J.- H. Dong, “Giaotracker: A comprehensive framework for mcmot with global information and optimizing strategies in visdrone 2021,” inICCVW 2021: Proceedings of the IEEE/CVF interna- tional conference on computer vision workshops, 2021, pp. 2809– 2819
work page 2021
-
[11]
Strongsort: Make deepsort great again,
Y .-H. Du, Z.-C. Zhao, Y . Song, Y .-Y . Zhao, F. Su, T. Gong, and H.-Y . Meng, “Strongsort: Make deepsort great again,”IEEE Trans. Multimedia., vol. 25, pp. 8725–8737, 2023
work page 2023
-
[12]
J.-L. Peng, C.-A. Wang, F.-B. Wan, Y . Wu, Y .-B. Wang, Y . Tai, C.-J. Wang, J.-L. Li, F.-Y . Huang, and Y .-W. Fu, “Chained- tracker: Chaining paired attentive regression results for end-to- end joint multiple-object detection and tracking,” inECCV 2020: Proceedings of the European conference on computer vision. Springer, 2020, pp. 145–161
work page 2020
-
[13]
Qdtrack: Quasi-dense similarity learning for appearance-only multiple object tracking,
T. Fischer, T. E. Huang, J.-M. Pang, L.-L. Qiu, H.-F. Chen, T. Darrell, and F. Yu, “Qdtrack: Quasi-dense similarity learning for appearance-only multiple object tracking,”IEEE Trans. Pattern. Anal. Mach. Intell., vol. 45, no. 12, pp. 15 380–15 393, 2023
work page 2023
-
[14]
Attentiontrack: Multiple object tracking in traffic scenarios using features attention,
C. Zhang, S. Zheng, H. Wu, Z. Gu, W. Sun, and L. Yang, “Attentiontrack: Multiple object tracking in traffic scenarios using features attention,”IEEE Transactions on Intelligent Transporta- tion Systems, vol. 25, no. 2, pp. 1661–1674, 2024
work page 2024
-
[15]
Lightweight and computationally efficient yolo for rogue uav detection in complex backgrounds,
Z. Kaleem, “Lightweight and computationally efficient yolo for rogue uav detection in complex backgrounds,”IEEE Transactions on Aerospace and Electronic Systems, vol. 61, no. 2, pp. 5362– 5366, 2025
work page 2025
-
[16]
Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild,
B. Huang, J.-A. Li, J.-J. Chen, G. Wang, J. Zhao, and T.-F. Xu, “Anti-uav410: A thermal infrared benchmark and customized scheme for tracking drones in the wild,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 5, pp. 2852–2865, 2024
work page 2024
-
[17]
Uavswarm dataset: An unmanned aerial vehicle swarm dataset for multiple object tracking,
C. Wang, Y . Su, J. Wang, T. Wang, and Q. Gao, “Uavswarm dataset: An unmanned aerial vehicle swarm dataset for multiple object tracking,”Remote Sensing, vol. 14, no. 11, 2022
work page 2022
-
[18]
An interactively motion- assisted network for multiple object tracking in complex traffic scenes,
Z. Shen, K. Cai, P. Zhao, and X. Luo, “An interactively motion- assisted network for multiple object tracking in complex traffic scenes,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 2, pp. 1992–2004, 2024
work page 1992
-
[19]
Lttrack: Rethinking the tracking framework for long-term multi-object tracking,
J. Lin, G. Liang, and R. Zhang, “Lttrack: Rethinking the tracking framework for long-term multi-object tracking,”IEEE Transac- tions on Circuits and Systems for Video Technology, vol. 34, no. 10, pp. 9866–9881, 2024
work page 2024
-
[20]
One-shot multiple object tracking with robust id preservation,
W. Lv, N. Zhang, J. Zhang, and D. Zeng, “One-shot multiple object tracking with robust id preservation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4473–4488, 2024
work page 2024
-
[21]
Yolo-3dmm for simultaneous multiple object detection and tracking in traffic scenarios,
L. Liu, X. Song, H. Song, S. Sun, X.-F. Han, N. Akhtar, and A. Mian, “Yolo-3dmm for simultaneous multiple object detection and tracking in traffic scenarios,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 8, pp. 9467–9481, 2024
work page 2024
-
[22]
Bytetrack: Multi- object tracking by associating every detection box,
Y .-F. Zhang, P.-Z. Sun, Y . Jiang, D.-D. Yu, F.-C. Weng, Z.-H. Yuan, P. Luo, W.-Y . Liu, and X.-G. Wang, “Bytetrack: Multi- object tracking by associating every detection box,” inECCV 2022: Proceedings of the European conference on computer vision. Springer, 2022, pp. 1–21
work page 2022
-
[23]
Observation-centric sort: Rethinking sort for robust multi-object tracking,
J.-K. Cao, J.-M. Pang, X.-S. Weng, R. Khirodkar, and K. Kitani, “Observation-centric sort: Rethinking sort for robust multi-object tracking,” inCVPR 2023: Proceedings of the IEEE /CVF confer- ence on computer vision and pattern recognition, 2023, pp. 9686– 9696
work page 2023
-
[24]
Hybrid-sort: Weak cues matter for online ZHAOCHEN CHU ET AL.: SCT-MOT 15 multi-object tracking,
M.-Z. Yang, G.-X. Han, B. Yan, W.-H. Zhang, J.-Q. Qi, H.-C. Lu, and D. Wang, “Hybrid-sort: Weak cues matter for online ZHAOCHEN CHU ET AL.: SCT-MOT 15 multi-object tracking,” inAAAI 2024: Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 7, 2024, pp. 6504– 6512
work page 2024
-
[25]
F. Yang, S. Odashima, S. Masui, and S. Jiang, “Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space,” inCVPR 2023: Pro- ceedings of the IEEE /CVF winter conference on applications of computer vision, 2023, pp. 4799–4808
work page 2023
-
[26]
Social lstm: Human trajectory prediction in crowded spaces,
A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 961–971
work page 2016
-
[27]
Contextual recurrent predictive model for long-term intent prediction of vulnerable road users,
K. Saleh, M. Hossny, and S. Nahavandi, “Contextual recurrent predictive model for long-term intent prediction of vulnerable road users,”IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 8, pp. 3398–3408, 2020
work page 2020
-
[28]
Analysis of recurrent neural networks for probabilistic modeling of driver be- havior,
J. Morton, T. A. Wheeler, and M. J. Kochenderfer, “Analysis of recurrent neural networks for probabilistic modeling of driver be- havior,”IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 5, pp. 1289–1298, 2017
work page 2017
-
[29]
Social attention: Modeling attention in human crowds,
A. Vemula, K. Muelling, and J. Oh, “Social attention: Modeling attention in human crowds,” in2018 IEEE International Confer- ence on Robotics and Automation (ICRA), 2018, pp. 4601–4607
work page 2018
-
[30]
Trajec- tory forecasting based on prior-aware directed graph convolutional neural network,
Y . Su, J. Du, Y . Li, X. Li, R. Liang, Z. Hua, and J. Zhou, “Trajec- tory forecasting based on prior-aware directed graph convolutional neural network,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 16 773–16 785, 2022
work page 2022
-
[31]
Collaborative uncertainty in multi-agent trajectory forecasting,
B. Tang, Y . Zhong, U. Neumann, G. Wang, S. Chen, and Y . Zhang, “Collaborative uncertainty in multi-agent trajectory forecasting,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6328–6340
work page 2021
-
[32]
Long-short term spatio-temporal aggrega- tion for trajectory prediction,
C. Yang and Z. Pei, “Long-short term spatio-temporal aggrega- tion for trajectory prediction,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 4, pp. 4114–4126, 2023
work page 2023
-
[33]
Mantra: Memory augmented networks for multiple trajectory prediction,
F. Marchetti, F. Becattini, L. Seidenari, and A. Del Bimbo, “Mantra: Memory augmented networks for multiple trajectory prediction,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7141–7150
work page 2020
-
[34]
Evolvegraph: Multi-agent trajectory prediction with dynamic relational reason- ing,
J. Li, F. Yang, M. Tomizuka, and C. Choi, “Evolvegraph: Multi-agent trajectory prediction with dynamic relational reason- ing,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 19 783–19 794
work page 2020
-
[35]
Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,
C. Xu, R. T. Tan, Y . Tan, S. Chen, Y . G. Wang, X. Wang, and Y . Wang, “Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning,” in2023 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 1410–1420
work page 2023
-
[36]
Bactrack: Building appearance collection for aerial tracking,
X. Liu, T. Xu, Y . Wang, Z. Yu, X. Yuan, H. Qin, and J. Li, “Bactrack: Building appearance collection for aerial tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 5002–5017, 2024
work page 2024
-
[37]
Multi-object tracking meets moving uav,
S. Liu, X. Li, H.-C. Lu, and Y . He, “Multi-object tracking meets moving uav,” inCVPR 2022: Proceedings of the IEEE /CVF conference on computer vision and pattern recognition, 2022, pp. 8876–8885
work page 2022
-
[38]
Sea you later: Metadata-guided long- term re-identification for uav-based multi-object tracking,
C.-Y . Yang, H.-W. Huang, Z. Jiang, H.-C. Kuo, J. Mei, C.-I. Huang, and J.-N. Hwang, “Sea you later: Metadata-guided long- term re-identification for uav-based multi-object tracking,” in2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2024, pp. 805–812
work page 2024
-
[39]
Ucmctrack: Multi-object tracking with uniform camera motion compensation,
K.-F. Yi, K. Luo, X.-L. Luo, J.-G. Huang, H. Wu, R.-D. Hu, and W. Hao, “Ucmctrack: Multi-object tracking with uniform camera motion compensation,” inAAAI 2023: Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 7, 2024, pp. 6702– 6710
work page 2023
-
[40]
Iterative scale-up expansioniou and deep features association for multi-object tracking in sports,
H.-W. Huang, C.-Y . Yang, J. Sun, P.-K. Kim, K.-J. Kim, K. Lee, C.-I. Huang, and J.-N. Hwang, “Iterative scale-up expansioniou and deep features association for multi-object tracking in sports,” inProceedings of the IEEE/CVF Winter Conference on Applica- tions of Computer Vision, 2024, pp. 163–172
work page 2024
-
[41]
Dc-mot: Motion deblurring and compensation for multi-object tracking in uav videos,
S. Cheng, M. Yao, and X. Xiao, “Dc-mot: Motion deblurring and compensation for multi-object tracking in uav videos,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 789–795
work page 2023
-
[42]
An experimental evaluation based on new air-to-air multi-uav tracking dataset,
Z. Chu, T. Song, R. Jin, and T. Jiang, “An experimental evaluation based on new air-to-air multi-uav tracking dataset,” inICUS 2023: Proceedings of the IEEE international conference on unmanned systems. IEEE, 2023, pp. 671–676
work page 2023
-
[43]
Multi-object continuous robust tracking algorithm for anti-uav swarm,
C. Wang, Y . Su, L. Wang, T. Wang, J. Wang, and Q. Gao, “Multi-object continuous robust tracking algorithm for anti-uav swarm,”Acta Aeronaut. Astronaut. Sin., vol. 45, no. 7, pp. 256– 269 [Chinese], 2024
work page 2024
-
[44]
Fairmot: On the fairness of detection and re-identification in multiple object tracking,
Y . Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,”Int. J. Comput. Vis., vol. 129, pp. 3069–3087, 2021
work page 2021
-
[45]
Vision-based air-to-air multi- uavs tracking,
Z. Chu, T. Song, R. Jin, and D. Lin, “Vision-based air-to-air multi- uavs tracking,”Acta Aeronaut. Astronaut. Sin., vol. 45, no. 14, p. 629379 [Chinese], 2024
work page 2024
-
[46]
Motrv3: Release-fetch supervision for end-to-end multi-object tracking
E. Yu, T. Wang, Z. Li, Y . Zhang, X. Zhang, and W. Tao, “Motrv3: Release-fetch supervision for end-to-end multi-object tracking,” ArXiv, vol. abs/2305.14298, 2023. Zhaochen Chureceived the B.E. degree in Science in Flight Vehicle Design and Engi- neering from Beijing Institute of Technology, Beijing, China, in 2021. He is currently pur- suing the Ph.D. de...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.