pith. sign in

arxiv: 2602.13636 · v2 · submitted 2026-02-14 · 💻 cs.CV

Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness

Pith reviewed 2026-05-15 22:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords UAV trackingvisual object trackinglayer selectionattention moduleocclusion robustnessreal-time trackingefficiencyfeature enhancement
0
0 comments X

The pith

LGTrack combines dynamic layer selection with lightweight GGCA and SGLA modules to track UAV objects at 258.7 FPS while keeping 82.8 percent precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LGTrack as a unified framework for visual object tracking from unmanned aerial vehicles. It addresses the accuracy-efficiency trade-off by using dynamic layer selection to pick useful features, a Global-Grouped Coordinate Attention module that captures long-range context at low cost, and a Similarity-Guided Layer Adaptation module that replaces heavier distillation techniques. The result is real-time performance on standard UAV datasets together with maintained accuracy under occlusion. A sympathetic reader cares because UAV applications require both speed on embedded hardware and reliability when targets are briefly hidden.

Core claim

LGTrack is a unified UAV tracking framework that integrates dynamic layer selection, the lightweight Global-Grouped Coordinate Attention (GGCA) module for global context with minimal overhead, and the Similarity-Guided Layer Adaptation (SGLA) module for robust representation learning. This combination yields state-of-the-art real-time speed of 258.7 FPS on the UAVDT dataset while preserving competitive tracking accuracy of 82.8 percent precision, as shown across three benchmark datasets.

What carries the argument

Dynamic layer selection guided by the GGCA module for efficient global feature enhancement and the SGLA module for similarity-based adaptation, which together replace knowledge distillation and support occlusion robustness.

If this is right

  • Real-time tracking becomes feasible on low-power UAV platforms without sacrificing much accuracy.
  • Occlusion handling improves through the similarity-guided adaptation that avoids full distillation overhead.
  • The framework maintains competitive precision across multiple UAV tracking benchmarks.
  • Inference speed reaches 258.7 FPS on UAVDT while using only the proposed lightweight modules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The layer-selection idea could be tested on other real-time vision tasks such as drone-based surveillance or autonomous navigation.
  • Replacing distillation with SGLA might simplify training pipelines for similar lightweight trackers.
  • The approach may extend to video object tracking outside UAV settings if the same layer-guidance logic holds.
  • Hardware-specific speed measurements would clarify whether the reported FPS transfers to different embedded processors.

Load-bearing premise

The GGCA and SGLA modules actually deliver the stated speed gains and occlusion handling without hidden accuracy costs that appear only in full tests.

What would settle it

Re-running the released code on UAVDT under the paper's occlusion test protocol and obtaining either under 200 FPS or under 70 percent precision on the same hardware.

read the original abstract

Visual object tracking (VOT) plays a pivotal role in unmanned aerial vehicle (UAV) applications. Addressing the trade-off between accuracy and efficiency, especially under challenging conditions like unpredictable occlusion, remains a significant challenge. This paper introduces LGTrack, a unified UAV tracking framework that integrates dynamic layer selection, efficient feature enhancement, and robust representation learning for occlusions. By employing a novel lightweight Global-Grouped Coordinate Attention (GGCA) module, LGTrack captures long-range dependencies and global contexts, enhancing feature discriminability with minimal computational overhead. Additionally, a lightweight Similarity-Guided Layer Adaptation (SGLA) module replaces knowledge distillation, achieving an optimal balance between tracking precision and inference efficiency. Experiments on three datasets demonstrate LGTrack's state-of-the-art real-time speed (258.7 FPS on UAVDT) while maintaining competitive tracking accuracy (82.8\% precision). Code is available at https://github.com/XiaoMoc/LGTrack

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces LGTrack, a unified UAV tracking framework combining dynamic layer selection, a lightweight Global-Grouped Coordinate Attention (GGCA) module to capture long-range dependencies with low overhead, and a Similarity-Guided Layer Adaptation (SGLA) module that replaces knowledge distillation for occlusion robustness. Experiments on three datasets report state-of-the-art real-time performance of 258.7 FPS and 82.8% precision on UAVDT, with code released at https://github.com/XiaoMoc/LGTrack.

Significance. If the reported speed-accuracy trade-off holds under the described conditions, the work is significant for real-time UAV applications where occlusion handling and efficiency are critical. The lightweight design of GGCA and SGLA, together with public code release, supports reproducibility and practical adoption; the internal consistency of the module descriptions and experimental coverage across datasets strengthens the contribution.

minor comments (3)
  1. Abstract: the performance claims would be easier to assess if the abstract briefly named the main baselines against which 258.7 FPS and 82.8% precision are compared.
  2. §3 (Method): the interaction between dynamic layer selection and the SGLA module could be illustrated with a single diagram or pseudocode line to clarify the forward pass.
  3. Table 1 or equivalent results section: report standard deviations or multiple runs for the FPS and precision numbers to quantify variability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. We appreciate the recognition of LGTrack's real-time performance, lightweight modules, and reproducibility via public code release.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents LGTrack as an engineering framework combining dynamic layer selection with two lightweight modules (GGCA and SGLA) whose designs are described directly in the text rather than derived from prior results. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear that would reduce any claimed performance gain to an input by construction. The reported FPS and precision figures are empirical measurements on standard benchmarks, not outputs of a closed-form derivation. The argument is therefore self-contained and externally falsifiable via the released code and datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5470 in / 1115 out tokens · 34314 ms · 2026-05-15T22:21:04.699083+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

  1. [1]

    Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

    Lu, M.: Ureptrack: single-branch poolformer for unified attention-free rgb-event visual object tracking. Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

  2. [2]

    Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

    Yang, K., Zhang, W., Li, P., Liang, J., Peng, T., Chen, J., Li, L., Hu, X., Liu, J.: Vit-bf: vision transformer with border-aware features for visual tracking. Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

  3. [3]

    Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

    Chen, Z., Liu, L., Yu, Z.: Toward robust visual tracking for uav with adaptive spatial-temporal weighted regularization. Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

  4. [4]

    Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

    Karakostas, I., Mygdalis, V., Nikolaidis, N., Pitas, I.: Enhancing visual object tracking robustness through a lightweight denoising module. Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

  5. [5]

    Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

    Gharsa, O., Touba, M.M., Boumehraz, M., Agram, N.: Autonomous vision-based object detection and tracking system for quadrotor unmanned aerial vehicles. Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

  6. [6]

    PoseNet: A convolutional network for real-time 6-dof camera relocalization,

    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: 2015 IEEE International Conference on 21 Computer Vision (ICCV), pp. 4310–4318 (2015). https://doi.org/10.1109/iccv. 2015.490

  7. [7]

    IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

  8. [8]

    and Caseiro, Rui and Martins, Pedro and Batista, Jorge , year=

    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence37(3), 583–596 (2015) https://doi.org/10.1109/tpami.2014.2345390

  9. [9]

    Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

    Shao, Y., Yang, H., Gao, R., Li, F.: Three-dimensional obstacle avoidance path planning for agricultural UAV based on improved ant colony algorithm. Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

  10. [10]

    In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)

    Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935

  11. [11]

    IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

    Wu, Y., Li, Y., Liu, M., Wang, X., Yang, X., Ye, H., Zeng, D., Zhao, Q., Li, S.: Learning an adaptive and view-invariant vision transformer for real-time uav tracking. IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

  12. [13]

    341–357 (2022)

    Ye, B., Chang, H., Ma, B., Shan, S., Chen, X.: Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework, pp. 341–357 (2022). https://doi.org/10.1007/978-3-031-20047-2 20

  13. [14]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Li, S., Yang, Y., Zeng, D., Wang, X.: Adaptive and background-aware vision transformer for real-time uav tracking. In: 2023 IEEE/CVF International Con- ference on Computer Vision (ICCV), pp. 13943–13954 (2023). https://doi.org/ 10.1109/iccv51070.2023.01286

  14. [15]

    Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

    Xue, C., Zhong, B., Liang, Q., Zheng, Y., Li, N., Xue, Y., Song, S.: Similarity- guided layer-adaptive vision transformer for uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6730–6740 (2025). https://doi.org/10.1109/cvpr52734.2025.00631

  15. [16]

    Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

    Wu, Y., Wang, X., Yang, X., Liu, M., Zeng, D., Ye, H., Li, S.: Learn- ing occlusion-robust vision transformers for real-time uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22 pp. 17103–17113 (2025). https://doi.org/10.1109/cvpr52734.2025.01594

  16. [17]

    In: Advances in Neural Information Processing Systems 37

    Shen, F., Tang, J.: Imagpose: A unified conditional framework for pose-guided person generation. In: Advances in Neural Information Processing Systems 37. NeurIPS 2024, pp. 6246–6266 (2024). https://doi.org/10.52202/079017-0202

  17. [18]

    In: Computer Animation and Virtual Worlds,36, (2025)

    Lin, C., Zou, C., Xu, H.: SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt. In: Computer Animation and Virtual Worlds,36, (2025). https://doi.org/10.1002/cav.70030

  18. [19]

    International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10

    Qiang, Z., Tao, W.: Enhancing visual SLAM localization accuracy through dynamic object detection and adaptive feature filtering. International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10. 53941/ijndi.2025.100018

  19. [20]

    In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931–6939 (2017). https://doi.org/10.1109/cvpr.2017. 733

  20. [21]

    In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

    Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11920–11929 (2020). https://doi.org/10.1109/cvpr42600.2020.01194

  21. [22]

    International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

    Chen, L., Wu, P., Tan, W., Li, H., Chen, H., Zeng, N.: A novel UAV-based road damage detection algorithm with lightweight convolution and attention mecha- nism. International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

  22. [23]

    Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

    Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

  23. [24]

    In: Virtual Reality & Intelligent Hardware, pp

    Zhao, Y., Zhang, H., Lu, P., Li, P., Wu, E., Sheng, B.: DSD-MatchingNet: Deformable sparse-to-dense feature matching for learning accurate correspon- dences. In: Virtual Reality & Intelligent Hardware, pp. 432–443 (2022). https: //doi.org/10.1016/j.vrih.2022.08.007

  24. [25]

    IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

    Hu, X., Zhong, B., Liang, Q., Zhang, S., Li, N., Li, X., Ji, R.: Transformer track- ing via frequency fusion. IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

  25. [26]

    Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

    Shi, L., Zhong, B., Liang, Q., Li, N., Zhang, S., Li, X.: Explicit visual prompts for visual object tracking. Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

  26. [27]

    Emogen: Emotional image content generation with text-to-image diffusion models,

    Xie, J., Zhong, B., Mo, Z., Zhang, S., Shi, L., Song, S., Ji, R.: Autoregres- sive queries for adaptive tracking with spatio-temporal transformers. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19300–19309 (2024). https://doi.org/10.1109/cvpr52733.2024.01826

  27. [28]

    A ConvNet for the 2020s

    Yin, H., Vahdat, A., Alvarez, J.M., Mallya, A., Kautz, J., Molchanov, P.: A-vit: Adaptive tokens for efficient vision transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/ 10.1109/cvpr52688.2022.01054

  28. [29]

    In: Proceedings of the British Machine Vision Conference 2021

    Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Multi-exit vision transformer for dynamic inference. In: Proceedings of the British Machine Vision Conference 2021. BMVC 2021 (2021). https://doi.org/10.5244/c.35.338

  29. [30]

    A ConvNet for the 2020s

    Park, J., Oh, Y., Moon, G., Choi, H., Lee, K.M.: Handoccnet: Occlusion-robust 3d hand mesh estimation network. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1486–1495 (2022). https://doi.org/ 10.1109/cvpr52688.2022.00155

  30. [31]

    In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp

    Wang, X., Lu, X., Bennamoun, M., Sheng, B.: Non-Rigid Point Cloud Regis- tration via Anisotropic Hybrid Field Harmonization. In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp. 7898–7915 (2025). https: //doi.org/10.1109/tpami.2025.3572584

  31. [32]

    Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

    Jiang, M., Wang, Y., McKeown, M.J., Wang, Z.J.: Occlusion-robust FAU recog- nition by mining latent space of masked autoencoders. Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

  32. [33]

    Louis, G

    Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Pedhunter: Occlusion robust pedestrian detector in crowded scenes. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 10639–10646 (2020) https://doi.org/10.1609/aaai. v34i07.6690

  33. [34]

    Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

    Das, S., Biswas, S.K., Purkayastha, B.: Occlusion robust sign language recogni- tion system for indian sign language using cnn and pose features. Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

  34. [35]

    International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10

    Askar, W.A., Elmowafy, O., Ralescu, A., Youssif, A.A., Elnashar, G.A.: Occlu- sion detection and processing using optical flow and particle filter. International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10. 1504/ijaip.2020.104107

  35. [36]

    Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/cvpr.2018.00745 24

  36. [37]

    3–19 (2018)

    Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional Block Attention Module, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2 1

  37. [38]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pp. 13708–13717 (2021). https://doi.org/10.1109/cvpr46437. 2021.01350

  38. [39]

    In: IEEE Transactions on Multimedia, pp

    Wen, Y., Luo, B., Shi, W., Ji, J., Cao, W., Yang, X., Sheng, B.: SAT-Net: Structure-Aware Transformer-Based Attention Fusion Network for Low-Quality Retinal Fundus Images Enhancement. In: IEEE Transactions on Multimedia, pp. 6198–6210 (2025). https://doi.org/10.1109/tmm.2025.3565935

  39. [40]

    A ConvNet for the 2020s

    He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022. 01553

  40. [41]

    Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

    Mattfeldt, T.: Stochastic geometry and its applications. Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

  41. [42]

    Florida State University, Tallahassee, FL (2016)

    Chen, Y.: Thinning algorithms for simulating point processes. Florida State University, Tallahassee, FL (2016)

  42. [43]

    In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Fan, H., Lin, L., Yang, F., al., e.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369–5378 (2019). https://doi.org/10.1109/ cvpr.2019.00552

  43. [44]

    In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T

    Lin, T.-Y., Maire, M., al., e.: Microsoft COCO: Common Objects in Context, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1 48

  44. [45]

    310–327 (2018)

    Mueller, M., Bibi, A., al., e.: TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild, pp. 310–327 (2018). https://doi.org/10.1007/ 978-3-030-01246-5 19

  45. [46]

    IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

    Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

  46. [47]

    Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

    Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: A bench- mark and new motion models. Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

  47. [48]

    International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

    Yu, H., Li, G., Zhang, W., Huang, Q., Du, D., Tian, Q., Sebe, N.: The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

  48. [49]

    In: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceed- ings, Part I 14, pp

    Mueller, M., Smith, N., Ghanem, B.: A Benchmark and Simulator for UAV Tracking, pp. 445–461 (2016). https://doi.org/10.1007/978-3-319-46448-0 27

  49. [50]

    Derf: Decomposed radiance fields,

    Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8122–8131 (2021) https://doi.org/10.1109/CVPR46437.2021.00803

  50. [51]

    Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

    Li, S., Liu, Y., Zhao, Q., Feng, Z.: Learning residue-aware correlation filters and refining scale for real-time uav tracking. Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

  51. [52]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15437–15446 (2021). https://doi.org/10.1109/iccv48922.2021.01517

  52. [53]

    A ConvNet for the 2020s

    Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., Fu, C.: Tctrack: Temporal contexts for aerial tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14778–14788 (2022). https://doi.org/10.1109/ cvpr52688.2022.01438

  53. [54]

    IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

    Zuo, H., Fu, C., Li, S., Lu, K., Li, Y., Feng, C.: Adversarial blur-deblur network for robust uav tracking. IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

  54. [55]

    ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

    Yao, L., Fu, C., Li, S., Zheng, G., Ye, J.: Sgdvit: Saliency-guided dynamic vision transformer for uav tracking. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3353–3359 (2023). https://doi.org/10. 1109/icra48891.2023.10161487

  55. [56]

    Dreher, T

    Fu, C., Lei, X., Zuo, H., Yao, L., Zheng, G., Pan, J.: Progressive representation learning for real-time uav tracking. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5072–5079 (2024). https://doi. org/10.1109/iros58592.2024.10803050

  56. [57]

    10610948

    Wei, Q., Zeng, B., Liu, J., He, L., Zeng, G.: Litetrack: Layer pruning with asynchronous feature extraction for lightweight and efficient visual tracking. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4968–4975 (2024). https://doi.org/10.1109/icra57147.2024.10610022

  57. [58]

    In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

    Gopal, G.Y., Amer, M.A.: Separable self and mixed attention transformers for efficient object tracking. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6694–6703 (2024). https://doi.org/10.1109/ wacv57701.2024.00657 26

  58. [59]

    DropBlock: A regularization method for convolutional networks

    Ghiasi, G., Lin, T.-Y., Le, Q.V.: Dropblock: A regularization method for convo- lutional networks. Advances in neural information processing systems31(2018) https://doi.org/arXiv:1810.12890

  59. [60]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Kirillov, A., Mintun, E., al., e.: Segment anything. In: 2023 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https: //doi.org/10.1109/iccv51070.2023.00371

  60. [61]

    , author Han, D

    Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022–6031 (2019). https://doi.org/10.1109/iccv.2019.00612

  61. [62]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Gao, S., Zhou, C., Zhang, J.: Generalized relation modeling for transformer track- ing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 18686–18695 (2023). https://doi.org/10.1109/cvpr52729.2023. 01792 27