Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness

Derui Ding; Haohua Zhang; Ran Sun; Yang Zhou; Ying Sun

arxiv: 2602.13636 · v2 · submitted 2026-02-14 · 💻 cs.CV

Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness

Yang Zhou , Derui Ding , Ran Sun , Ying Sun , Haohua Zhang This is my paper

Pith reviewed 2026-05-15 22:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords UAV trackingvisual object trackinglayer selectionattention moduleocclusion robustnessreal-time trackingefficiencyfeature enhancement

0 comments

The pith

LGTrack combines dynamic layer selection with lightweight GGCA and SGLA modules to track UAV objects at 258.7 FPS while keeping 82.8 percent precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LGTrack as a unified framework for visual object tracking from unmanned aerial vehicles. It addresses the accuracy-efficiency trade-off by using dynamic layer selection to pick useful features, a Global-Grouped Coordinate Attention module that captures long-range context at low cost, and a Similarity-Guided Layer Adaptation module that replaces heavier distillation techniques. The result is real-time performance on standard UAV datasets together with maintained accuracy under occlusion. A sympathetic reader cares because UAV applications require both speed on embedded hardware and reliability when targets are briefly hidden.

Core claim

LGTrack is a unified UAV tracking framework that integrates dynamic layer selection, the lightweight Global-Grouped Coordinate Attention (GGCA) module for global context with minimal overhead, and the Similarity-Guided Layer Adaptation (SGLA) module for robust representation learning. This combination yields state-of-the-art real-time speed of 258.7 FPS on the UAVDT dataset while preserving competitive tracking accuracy of 82.8 percent precision, as shown across three benchmark datasets.

What carries the argument

Dynamic layer selection guided by the GGCA module for efficient global feature enhancement and the SGLA module for similarity-based adaptation, which together replace knowledge distillation and support occlusion robustness.

If this is right

Real-time tracking becomes feasible on low-power UAV platforms without sacrificing much accuracy.
Occlusion handling improves through the similarity-guided adaptation that avoids full distillation overhead.
The framework maintains competitive precision across multiple UAV tracking benchmarks.
Inference speed reaches 258.7 FPS on UAVDT while using only the proposed lightweight modules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The layer-selection idea could be tested on other real-time vision tasks such as drone-based surveillance or autonomous navigation.
Replacing distillation with SGLA might simplify training pipelines for similar lightweight trackers.
The approach may extend to video object tracking outside UAV settings if the same layer-guidance logic holds.
Hardware-specific speed measurements would clarify whether the reported FPS transfers to different embedded processors.

Load-bearing premise

The GGCA and SGLA modules actually deliver the stated speed gains and occlusion handling without hidden accuracy costs that appear only in full tests.

What would settle it

Re-running the released code on UAVDT under the paper's occlusion test protocol and obtaining either under 200 FPS or under 70 percent precision on the same hardware.

read the original abstract

Visual object tracking (VOT) plays a pivotal role in unmanned aerial vehicle (UAV) applications. Addressing the trade-off between accuracy and efficiency, especially under challenging conditions like unpredictable occlusion, remains a significant challenge. This paper introduces LGTrack, a unified UAV tracking framework that integrates dynamic layer selection, efficient feature enhancement, and robust representation learning for occlusions. By employing a novel lightweight Global-Grouped Coordinate Attention (GGCA) module, LGTrack captures long-range dependencies and global contexts, enhancing feature discriminability with minimal computational overhead. Additionally, a lightweight Similarity-Guided Layer Adaptation (SGLA) module replaces knowledge distillation, achieving an optimal balance between tracking precision and inference efficiency. Experiments on three datasets demonstrate LGTrack's state-of-the-art real-time speed (258.7 FPS on UAVDT) while maintaining competitive tracking accuracy (82.8\% precision). Code is available at https://github.com/XiaoMoc/LGTrack

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces LGTrack, a unified UAV tracking framework combining dynamic layer selection, a lightweight Global-Grouped Coordinate Attention (GGCA) module to capture long-range dependencies with low overhead, and a Similarity-Guided Layer Adaptation (SGLA) module that replaces knowledge distillation for occlusion robustness. Experiments on three datasets report state-of-the-art real-time performance of 258.7 FPS and 82.8% precision on UAVDT, with code released at https://github.com/XiaoMoc/LGTrack.

Significance. If the reported speed-accuracy trade-off holds under the described conditions, the work is significant for real-time UAV applications where occlusion handling and efficiency are critical. The lightweight design of GGCA and SGLA, together with public code release, supports reproducibility and practical adoption; the internal consistency of the module descriptions and experimental coverage across datasets strengthens the contribution.

minor comments (3)

Abstract: the performance claims would be easier to assess if the abstract briefly named the main baselines against which 258.7 FPS and 82.8% precision are compared.
§3 (Method): the interaction between dynamic layer selection and the SGLA module could be illustrated with a single diagram or pseudocode line to clarify the forward pass.
Table 1 or equivalent results section: report standard deviations or multiple runs for the FPS and precision numbers to quantify variability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. We appreciate the recognition of LGTrack's real-time performance, lightweight modules, and reproducibility via public code release.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents LGTrack as an engineering framework combining dynamic layer selection with two lightweight modules (GGCA and SGLA) whose designs are described directly in the text rather than derived from prior results. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear that would reduce any claimed performance gain to an input by construction. The reported FPS and precision figures are empirical measurements on standard benchmarks, not outputs of a closed-form derivation. The argument is therefore self-contained and externally falsifiable via the released code and datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5470 in / 1115 out tokens · 34314 ms · 2026-05-15T22:21:04.699083+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

Lu, M.: Ureptrack: single-branch poolformer for unified attention-free rgb-event visual object tracking. Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

work page 2025
[2]

Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

Yang, K., Zhang, W., Li, P., Liang, J., Peng, T., Chen, J., Li, L., Hu, X., Liu, J.: Vit-bf: vision transformer with border-aware features for visual tracking. Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

work page doi:10.1007/s00371-025-03964-z 2025
[3]

Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

Chen, Z., Liu, L., Yu, Z.: Toward robust visual tracking for uav with adaptive spatial-temporal weighted regularization. Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

work page doi:10.1007/s00371-024-03290-w 2024
[4]

Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

Karakostas, I., Mygdalis, V., Nikolaidis, N., Pitas, I.: Enhancing visual object tracking robustness through a lightweight denoising module. Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

work page doi:10.1007/s00371-025-03888-8 2025
[5]

Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

Gharsa, O., Touba, M.M., Boumehraz, M., Agram, N.: Autonomous vision-based object detection and tracking system for quadrotor unmanned aerial vehicles. Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

work page doi:10.3390/s25206403 2025
[6]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: 2015 IEEE International Conference on 21 Computer Vision (ICCV), pp. 4310–4318 (2015). https://doi.org/10.1109/iccv. 2015.490

work page doi:10.1109/iccv 2015
[7]

IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

work page doi:10.1109/tpami.2016.2609928 2017
[8]

and Caseiro, Rui and Martins, Pedro and Batista, Jorge , year=

Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence37(3), 583–596 (2015) https://doi.org/10.1109/tpami.2014.2345390

work page doi:10.1109/tpami.2014.2345390 2015
[9]

Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

Shao, Y., Yang, H., Gao, R., Li, F.: Three-dimensional obstacle avoidance path planning for agricultural UAV based on improved ant colony algorithm. Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

work page doi:10.53941/ijndi.2025.100028 2025
[10]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)

Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935

work page doi:10.1109/cvpr.2018.00935 2018
[11]

IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

Wu, Y., Li, Y., Liu, M., Wang, X., Yang, X., Ye, H., Zeng, D., Zhao, Q., Li, S.: Learning an adaptive and view-invariant vision transformer for real-time uav tracking. IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

work page doi:10.1109/tcsvt.2025.3599856 2025
[13]

341–357 (2022)

Ye, B., Chang, H., Ma, B., Shan, S., Chen, X.: Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework, pp. 341–357 (2022). https://doi.org/10.1007/978-3-031-20047-2 20

work page doi:10.1007/978-3-031-20047-2 2022
[14]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Li, S., Yang, Y., Zeng, D., Wang, X.: Adaptive and background-aware vision transformer for real-time uav tracking. In: 2023 IEEE/CVF International Con- ference on Computer Vision (ICCV), pp. 13943–13954 (2023). https://doi.org/ 10.1109/iccv51070.2023.01286

work page doi:10.1109/iccv51070.2023.01286 2023
[15]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Xue, C., Zhong, B., Liang, Q., Zheng, Y., Li, N., Xue, Y., Song, S.: Similarity- guided layer-adaptive vision transformer for uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6730–6740 (2025). https://doi.org/10.1109/cvpr52734.2025.00631

work page doi:10.1109/cvpr52734.2025.00631 2025
[16]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Wu, Y., Wang, X., Yang, X., Liu, M., Zeng, D., Ye, H., Li, S.: Learn- ing occlusion-robust vision transformers for real-time uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22 pp. 17103–17113 (2025). https://doi.org/10.1109/cvpr52734.2025.01594

work page doi:10.1109/cvpr52734.2025.01594 2025
[17]

In: Advances in Neural Information Processing Systems 37

Shen, F., Tang, J.: Imagpose: A unified conditional framework for pose-guided person generation. In: Advances in Neural Information Processing Systems 37. NeurIPS 2024, pp. 6246–6266 (2024). https://doi.org/10.52202/079017-0202

work page doi:10.52202/079017-0202 2024
[18]

In: Computer Animation and Virtual Worlds,36, (2025)

Lin, C., Zou, C., Xu, H.: SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt. In: Computer Animation and Virtual Worlds,36, (2025). https://doi.org/10.1002/cav.70030

work page doi:10.1002/cav.70030 2025
[19]

International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10

Qiang, Z., Tao, W.: Enhancing visual SLAM localization accuracy through dynamic object detection and adaptive feature filtering. International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10. 53941/ijndi.2025.100018

work page arXiv 2025
[20]

In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931–6939 (2017). https://doi.org/10.1109/cvpr.2017. 733

work page doi:10.1109/cvpr.2017 2017
[21]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11920–11929 (2020). https://doi.org/10.1109/cvpr42600.2020.01194

work page doi:10.1109/cvpr42600.2020.01194 2020
[22]

International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

Chen, L., Wu, P., Tan, W., Li, H., Chen, H., Zeng, N.: A novel UAV-based road damage detection algorithm with lightweight convolution and attention mecha- nism. International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

work page doi:10.53941/ijndi.2025.100025 2025
[23]

Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

work page doi:10.1609/aaai.v34i07.6944 2020
[24]

In: Virtual Reality & Intelligent Hardware, pp

Zhao, Y., Zhang, H., Lu, P., Li, P., Wu, E., Sheng, B.: DSD-MatchingNet: Deformable sparse-to-dense feature matching for learning accurate correspon- dences. In: Virtual Reality & Intelligent Hardware, pp. 432–443 (2022). https: //doi.org/10.1016/j.vrih.2022.08.007

work page doi:10.1016/j.vrih.2022.08.007 2022
[25]

IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

Hu, X., Zhong, B., Liang, Q., Zhang, S., Li, N., Li, X., Ji, R.: Transformer track- ing via frequency fusion. IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

work page doi:10.1109/tcsvt.2023.3289624 2024
[26]

Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

Shi, L., Zhong, B., Liang, Q., Li, N., Zhang, S., Li, X.: Explicit visual prompts for visual object tracking. Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

work page doi:10.1609/aaai.v38i5.28286 2024
[27]

Emogen: Emotional image content generation with text-to-image diffusion models,

Xie, J., Zhong, B., Mo, Z., Zhang, S., Shi, L., Song, S., Ji, R.: Autoregres- sive queries for adaptive tracking with spatio-temporal transformers. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19300–19309 (2024). https://doi.org/10.1109/cvpr52733.2024.01826

work page doi:10.1109/cvpr52733.2024.01826 2024
[28]

A ConvNet for the 2020s

Yin, H., Vahdat, A., Alvarez, J.M., Mallya, A., Kautz, J., Molchanov, P.: A-vit: Adaptive tokens for efficient vision transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/ 10.1109/cvpr52688.2022.01054

work page doi:10.1109/cvpr52688.2022.01054 2022
[29]

In: Proceedings of the British Machine Vision Conference 2021

Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Multi-exit vision transformer for dynamic inference. In: Proceedings of the British Machine Vision Conference 2021. BMVC 2021 (2021). https://doi.org/10.5244/c.35.338

work page doi:10.5244/c.35.338 2021
[30]

A ConvNet for the 2020s

Park, J., Oh, Y., Moon, G., Choi, H., Lee, K.M.: Handoccnet: Occlusion-robust 3d hand mesh estimation network. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1486–1495 (2022). https://doi.org/ 10.1109/cvpr52688.2022.00155

work page doi:10.1109/cvpr52688.2022.00155 2022
[31]

In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp

Wang, X., Lu, X., Bennamoun, M., Sheng, B.: Non-Rigid Point Cloud Regis- tration via Anisotropic Hybrid Field Harmonization. In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp. 7898–7915 (2025). https: //doi.org/10.1109/tpami.2025.3572584

work page doi:10.1109/tpami.2025.3572584 2025
[32]

Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

Jiang, M., Wang, Y., McKeown, M.J., Wang, Z.J.: Occlusion-robust FAU recog- nition by mining latent space of masked autoencoders. Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

work page doi:10.2139/ssrn.4342053 2024
[33]

Louis, G

Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Pedhunter: Occlusion robust pedestrian detector in crowded scenes. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 10639–10646 (2020) https://doi.org/10.1609/aaai. v34i07.6690

work page doi:10.1609/aaai 2020
[34]

Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

Das, S., Biswas, S.K., Purkayastha, B.: Occlusion robust sign language recogni- tion system for indian sign language using cnn and pose features. Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

work page 2024
[35]

International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10

Askar, W.A., Elmowafy, O., Ralescu, A., Youssif, A.A., Elnashar, G.A.: Occlu- sion detection and processing using optical flow and particle filter. International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10. 1504/ijaip.2020.104107

work page arXiv 2020
[36]

Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/cvpr.2018.00745 24

work page doi:10.1109/cvpr.2018.00745 2018
[37]

3–19 (2018)

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional Block Attention Module, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2 1

work page doi:10.1007/978-3-030-01234-2 2018
[38]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pp. 13708–13717 (2021). https://doi.org/10.1109/cvpr46437. 2021.01350

work page doi:10.1109/cvpr46437 2021
[39]

In: IEEE Transactions on Multimedia, pp

Wen, Y., Luo, B., Shi, W., Ji, J., Cao, W., Yang, X., Sheng, B.: SAT-Net: Structure-Aware Transformer-Based Attention Fusion Network for Low-Quality Retinal Fundus Images Enhancement. In: IEEE Transactions on Multimedia, pp. 6198–6210 (2025). https://doi.org/10.1109/tmm.2025.3565935

work page doi:10.1109/tmm.2025.3565935 2025
[40]

A ConvNet for the 2020s

He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022. 01553

work page doi:10.1109/cvpr52688.2022 2022
[41]

Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

Mattfeldt, T.: Stochastic geometry and its applications. Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

work page doi:10.1046/j.1365-2818.1996.00654.x 1996
[42]

Florida State University, Tallahassee, FL (2016)

Chen, Y.: Thinning algorithms for simulating point processes. Florida State University, Tallahassee, FL (2016)

work page 2016
[43]

In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Fan, H., Lin, L., Yang, F., al., e.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369–5378 (2019). https://doi.org/10.1109/ cvpr.2019.00552

work page arXiv 2019
[44]

In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T

Lin, T.-Y., Maire, M., al., e.: Microsoft COCO: Common Objects in Context, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1 48

work page doi:10.1007/978-3-319-10602-1 2014
[45]

310–327 (2018)

Mueller, M., Bibi, A., al., e.: TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild, pp. 310–327 (2018). https://doi.org/10.1007/ 978-3-030-01246-5 19

work page 2018
[46]

IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

work page arXiv 2021
[47]

Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: A bench- mark and new motion models. Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

work page doi:10.1609/aaai.v31i1.11205 2017
[48]

International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

Yu, H., Li, G., Zhang, W., Huang, Q., Du, D., Tian, Q., Sebe, N.: The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

work page 2019
[49]

In: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceed- ings, Part I 14, pp

Mueller, M., Smith, N., Ghanem, B.: A Benchmark and Simulator for UAV Tracking, pp. 445–461 (2016). https://doi.org/10.1007/978-3-319-46448-0 27

work page doi:10.1007/978-3-319-46448-0 2016
[50]

Derf: Decomposed radiance fields,

Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8122–8131 (2021) https://doi.org/10.1109/CVPR46437.2021.00803

work page doi:10.1109/cvpr46437.2021.00803 2021
[51]

Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

Li, S., Liu, Y., Zhao, Q., Feng, Z.: Learning residue-aware correlation filters and refining scale for real-time uav tracking. Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

work page doi:10.1016/j.patcog.2022.108614 2022
[52]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15437–15446 (2021). https://doi.org/10.1109/iccv48922.2021.01517

work page doi:10.1109/iccv48922.2021.01517 2021
[53]

A ConvNet for the 2020s

Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., Fu, C.: Tctrack: Temporal contexts for aerial tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14778–14788 (2022). https://doi.org/10.1109/ cvpr52688.2022.01438

work page arXiv 2022
[54]

IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

Zuo, H., Fu, C., Li, S., Lu, K., Li, Y., Feng, C.: Adversarial blur-deblur network for robust uav tracking. IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

work page doi:10.1109/lra.2023.3236584 2023
[55]

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Yao, L., Fu, C., Li, S., Zheng, G., Ye, J.: Sgdvit: Saliency-guided dynamic vision transformer for uav tracking. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3353–3359 (2023). https://doi.org/10. 1109/icra48891.2023.10161487

work page arXiv 2023
[56]

Dreher, T

Fu, C., Lei, X., Zuo, H., Yao, L., Zheng, G., Pan, J.: Progressive representation learning for real-time uav tracking. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5072–5079 (2024). https://doi. org/10.1109/iros58592.2024.10803050

work page doi:10.1109/iros58592.2024.10803050 2024
[57]

10610948

Wei, Q., Zeng, B., Liu, J., He, L., Zeng, G.: Litetrack: Layer pruning with asynchronous feature extraction for lightweight and efficient visual tracking. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4968–4975 (2024). https://doi.org/10.1109/icra57147.2024.10610022

work page doi:10.1109/icra57147.2024.10610022 2024
[58]

In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

Gopal, G.Y., Amer, M.A.: Separable self and mixed attention transformers for efficient object tracking. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6694–6703 (2024). https://doi.org/10.1109/ wacv57701.2024.00657 26

work page arXiv 2024
[59]

DropBlock: A regularization method for convolutional networks

Ghiasi, G., Lin, T.-Y., Le, Q.V.: Dropblock: A regularization method for convo- lutional networks. Advances in neural information processing systems31(2018) https://doi.org/arXiv:1810.12890

work page internal anchor Pith review Pith/arXiv arXiv 2018
[60]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kirillov, A., Mintun, E., al., e.: Segment anything. In: 2023 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https: //doi.org/10.1109/iccv51070.2023.00371

work page doi:10.1109/iccv51070.2023.00371 2023
[61]

, author Han, D

Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022–6031 (2019). https://doi.org/10.1109/iccv.2019.00612

work page doi:10.1109/iccv.2019.00612 2019
[62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Gao, S., Zhou, C., Zhang, J.: Generalized relation modeling for transformer track- ing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 18686–18695 (2023). https://doi.org/10.1109/cvpr52729.2023. 01792 27

work page doi:10.1109/cvpr52729.2023 2023

[1] [1]

Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

Lu, M.: Ureptrack: single-branch poolformer for unified attention-free rgb-event visual object tracking. Visual Computer42(1) (2025) https://doi.org/10.1007/ s00371-025-04309-6

work page 2025

[2] [2]

Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

Yang, K., Zhang, W., Li, P., Liang, J., Peng, T., Chen, J., Li, L., Hu, X., Liu, J.: Vit-bf: vision transformer with border-aware features for visual tracking. Visual Computer41(9), 6631–6644 (2025) https://doi.org/10.1007/s00371-025-03964-z

work page doi:10.1007/s00371-025-03964-z 2025

[3] [3]

Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

Chen, Z., Liu, L., Yu, Z.: Toward robust visual tracking for uav with adaptive spatial-temporal weighted regularization. Visual Computer40(12), 8987–9003 (2024) https://doi.org/10.1007/s00371-024-03290-w

work page doi:10.1007/s00371-024-03290-w 2024

[4] [4]

Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

Karakostas, I., Mygdalis, V., Nikolaidis, N., Pitas, I.: Enhancing visual object tracking robustness through a lightweight denoising module. Visual Computer 41(11), 8627–8644 (2025) https://doi.org/10.1007/s00371-025-03888-8

work page doi:10.1007/s00371-025-03888-8 2025

[5] [5]

Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

Gharsa, O., Touba, M.M., Boumehraz, M., Agram, N.: Autonomous vision-based object detection and tracking system for quadrotor unmanned aerial vehicles. Sensors25(20), 6403 (2025) https://doi.org/10.3390/s25206403

work page doi:10.3390/s25206403 2025

[6] [6]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: 2015 IEEE International Conference on 21 Computer Vision (ICCV), pp. 4310–4318 (2015). https://doi.org/10.1109/iccv. 2015.490

work page doi:10.1109/iccv 2015

[7] [7]

IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence39(8), 1561–1575 (2017) https://doi.org/10.1109/tpami.2016.2609928

work page doi:10.1109/tpami.2016.2609928 2017

[8] [8]

and Caseiro, Rui and Martins, Pedro and Batista, Jorge , year=

Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence37(3), 583–596 (2015) https://doi.org/10.1109/tpami.2014.2345390

work page doi:10.1109/tpami.2014.2345390 2015

[9] [9]

Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

Shao, Y., Yang, H., Gao, R., Li, F.: Three-dimensional obstacle avoidance path planning for agricultural UAV based on improved ant colony algorithm. Inter- national Journal of Network Dynamics and Intelligence4(4), 100028 (2025) https://doi.org/10.53941/ijndi.2025.100028

work page doi:10.53941/ijndi.2025.100028 2025

[10] [10]

In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)

Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). https://doi.org/10.1109/cvpr.2018.00935

work page doi:10.1109/cvpr.2018.00935 2018

[11] [11]

IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

Wu, Y., Li, Y., Liu, M., Wang, X., Yang, X., Ye, H., Zeng, D., Zhao, Q., Li, S.: Learning an adaptive and view-invariant vision transformer for real-time uav tracking. IEEE Transactions on Circuits and Systems for Video Technology, 1–1 (2025) https://doi.org/10.1109/tcsvt.2025.3599856

work page doi:10.1109/tcsvt.2025.3599856 2025

[12] [13]

341–357 (2022)

Ye, B., Chang, H., Ma, B., Shan, S., Chen, X.: Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework, pp. 341–357 (2022). https://doi.org/10.1007/978-3-031-20047-2 20

work page doi:10.1007/978-3-031-20047-2 2022

[13] [14]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Li, S., Yang, Y., Zeng, D., Wang, X.: Adaptive and background-aware vision transformer for real-time uav tracking. In: 2023 IEEE/CVF International Con- ference on Computer Vision (ICCV), pp. 13943–13954 (2023). https://doi.org/ 10.1109/iccv51070.2023.01286

work page doi:10.1109/iccv51070.2023.01286 2023

[14] [15]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Xue, C., Zhong, B., Liang, Q., Zheng, Y., Li, N., Xue, Y., Song, S.: Similarity- guided layer-adaptive vision transformer for uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6730–6740 (2025). https://doi.org/10.1109/cvpr52734.2025.00631

work page doi:10.1109/cvpr52734.2025.00631 2025

[15] [16]

Freeman, Frédo Durand, Eli Shechtman, and Xun Huang

Wu, Y., Wang, X., Yang, X., Liu, M., Zeng, D., Ye, H., Li, S.: Learn- ing occlusion-robust vision transformers for real-time uav tracking. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 22 pp. 17103–17113 (2025). https://doi.org/10.1109/cvpr52734.2025.01594

work page doi:10.1109/cvpr52734.2025.01594 2025

[16] [17]

In: Advances in Neural Information Processing Systems 37

Shen, F., Tang, J.: Imagpose: A unified conditional framework for pose-guided person generation. In: Advances in Neural Information Processing Systems 37. NeurIPS 2024, pp. 6246–6266 (2024). https://doi.org/10.52202/079017-0202

work page doi:10.52202/079017-0202 2024

[17] [18]

In: Computer Animation and Virtual Worlds,36, (2025)

Lin, C., Zou, C., Xu, H.: SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt. In: Computer Animation and Virtual Worlds,36, (2025). https://doi.org/10.1002/cav.70030

work page doi:10.1002/cav.70030 2025

[18] [19]

International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10

Qiang, Z., Tao, W.: Enhancing visual SLAM localization accuracy through dynamic object detection and adaptive feature filtering. International Journal of Network Dynamics and Intelligence4(8), 100018 (2025) https://doi.org/10. 53941/ijndi.2025.100018

work page arXiv 2025

[19] [20]

In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931–6939 (2017). https://doi.org/10.1109/cvpr.2017. 733

work page doi:10.1109/cvpr.2017 2017

[20] [21]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: Autotrack: Towards high-performance visual tracking for uav with automatic spatio-temporal regularization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11920–11929 (2020). https://doi.org/10.1109/cvpr42600.2020.01194

work page doi:10.1109/cvpr42600.2020.01194 2020

[21] [22]

International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

Chen, L., Wu, P., Tan, W., Li, H., Chen, H., Zeng, N.: A novel UAV-based road damage detection algorithm with lightweight convolution and attention mecha- nism. International Journal of Network Dynamics and Intelligence4(4), 100025 (2025) https://doi.org/10.53941/ijndi.2025.100025

work page doi:10.53941/ijndi.2025.100025 2025

[22] [23]

Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 12549–12556 (2020) https: //doi.org/10.1609/aaai.v34i07.6944

work page doi:10.1609/aaai.v34i07.6944 2020

[23] [24]

In: Virtual Reality & Intelligent Hardware, pp

Zhao, Y., Zhang, H., Lu, P., Li, P., Wu, E., Sheng, B.: DSD-MatchingNet: Deformable sparse-to-dense feature matching for learning accurate correspon- dences. In: Virtual Reality & Intelligent Hardware, pp. 432–443 (2022). https: //doi.org/10.1016/j.vrih.2022.08.007

work page doi:10.1016/j.vrih.2022.08.007 2022

[24] [25]

IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

Hu, X., Zhong, B., Liang, Q., Zhang, S., Li, N., Li, X., Ji, R.: Transformer track- ing via frequency fusion. IEEE Transactions on Circuits and Systems for Video Technology34(2), 1020–1031 (2024) https://doi.org/10.1109/tcsvt.2023.3289624

work page doi:10.1109/tcsvt.2023.3289624 2024

[25] [26]

Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

Shi, L., Zhong, B., Liang, Q., Li, N., Zhang, S., Li, X.: Explicit visual prompts for visual object tracking. Proceedings of the AAAI Conference on Artificial Intelligence38(5), 4838–4846 (2024) https://doi.org/10.1609/aaai.v38i5.28286 23

work page doi:10.1609/aaai.v38i5.28286 2024

[26] [27]

Emogen: Emotional image content generation with text-to-image diffusion models,

Xie, J., Zhong, B., Mo, Z., Zhang, S., Shi, L., Song, S., Ji, R.: Autoregres- sive queries for adaptive tracking with spatio-temporal transformers. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19300–19309 (2024). https://doi.org/10.1109/cvpr52733.2024.01826

work page doi:10.1109/cvpr52733.2024.01826 2024

[27] [28]

A ConvNet for the 2020s

Yin, H., Vahdat, A., Alvarez, J.M., Mallya, A., Kautz, J., Molchanov, P.: A-vit: Adaptive tokens for efficient vision transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/ 10.1109/cvpr52688.2022.01054

work page doi:10.1109/cvpr52688.2022.01054 2022

[28] [29]

In: Proceedings of the British Machine Vision Conference 2021

Bakhtiarnia, A., Zhang, Q., Iosifidis, A.: Multi-exit vision transformer for dynamic inference. In: Proceedings of the British Machine Vision Conference 2021. BMVC 2021 (2021). https://doi.org/10.5244/c.35.338

work page doi:10.5244/c.35.338 2021

[29] [30]

A ConvNet for the 2020s

Park, J., Oh, Y., Moon, G., Choi, H., Lee, K.M.: Handoccnet: Occlusion-robust 3d hand mesh estimation network. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1486–1495 (2022). https://doi.org/ 10.1109/cvpr52688.2022.00155

work page doi:10.1109/cvpr52688.2022.00155 2022

[30] [31]

In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp

Wang, X., Lu, X., Bennamoun, M., Sheng, B.: Non-Rigid Point Cloud Regis- tration via Anisotropic Hybrid Field Harmonization. In: IEEE Transactions on Pattern Analysis and Machine Intelligence,47, pp. 7898–7915 (2025). https: //doi.org/10.1109/tpami.2025.3572584

work page doi:10.1109/tpami.2025.3572584 2025

[31] [32]

Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

Jiang, M., Wang, Y., McKeown, M.J., Wang, Z.J.: Occlusion-robust FAU recog- nition by mining latent space of masked autoencoders. Neurocomputing569, 127107 (2024) https://doi.org/10.2139/ssrn.4342053

work page doi:10.2139/ssrn.4342053 2024

[32] [33]

Louis, G

Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Pedhunter: Occlusion robust pedestrian detector in crowded scenes. Proceedings of the AAAI Conference on Artificial Intelligence34(07), 10639–10646 (2020) https://doi.org/10.1609/aaai. v34i07.6690

work page doi:10.1609/aaai 2020

[33] [34]

Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

Das, S., Biswas, S.K., Purkayastha, B.: Occlusion robust sign language recogni- tion system for indian sign language using cnn and pose features. Multimedia Tools and Applications83(36), 84141–84160 (2024) https://doi.org/10.1007/ s11042-024-19068-0

work page 2024

[34] [35]

International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10

Askar, W.A., Elmowafy, O., Ralescu, A., Youssif, A.A., Elnashar, G.A.: Occlu- sion detection and processing using optical flow and particle filter. International Journal of Advanced Intelligence Paradigms15(1), 63 (2020) https://doi.org/10. 1504/ijaip.2020.104107

work page arXiv 2020

[35] [36]

Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/cvpr.2018.00745 24

work page doi:10.1109/cvpr.2018.00745 2018

[36] [37]

3–19 (2018)

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional Block Attention Module, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2 1

work page doi:10.1007/978-3-030-01234-2 2018

[37] [38]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pp. 13708–13717 (2021). https://doi.org/10.1109/cvpr46437. 2021.01350

work page doi:10.1109/cvpr46437 2021

[38] [39]

In: IEEE Transactions on Multimedia, pp

Wen, Y., Luo, B., Shi, W., Ji, J., Cao, W., Yang, X., Sheng, B.: SAT-Net: Structure-Aware Transformer-Based Attention Fusion Network for Low-Quality Retinal Fundus Images Enhancement. In: IEEE Transactions on Multimedia, pp. 6198–6210 (2025). https://doi.org/10.1109/tmm.2025.3565935

work page doi:10.1109/tmm.2025.3565935 2025

[39] [40]

A ConvNet for the 2020s

He, K., Chen, X., Xie, S., Li, Y., Dollar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/cvpr52688.2022. 01553

work page doi:10.1109/cvpr52688.2022 2022

[40] [41]

Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

Mattfeldt, T.: Stochastic geometry and its applications. Journal of Microscopy 183, 257–257 (1996) https://doi.org/10.1046/j.1365-2818.1996.00654.x

work page doi:10.1046/j.1365-2818.1996.00654.x 1996

[41] [42]

Florida State University, Tallahassee, FL (2016)

Chen, Y.: Thinning algorithms for simulating point processes. Florida State University, Tallahassee, FL (2016)

work page 2016

[42] [43]

In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Fan, H., Lin, L., Yang, F., al., e.: Lasot: A high-quality benchmark for large-scale single object tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5369–5378 (2019). https://doi.org/10.1109/ cvpr.2019.00552

work page arXiv 2019

[43] [44]

In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T

Lin, T.-Y., Maire, M., al., e.: Microsoft COCO: Common Objects in Context, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1 48

work page doi:10.1007/978-3-319-10602-1 2014

[44] [45]

310–327 (2018)

Mueller, M., Bibi, A., al., e.: TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild, pp. 310–327 (2018). https://doi.org/10.1007/ 978-3-030-01246-5 19

work page 2018

[45] [46]

IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analy- sis and Machine Intelligence43(5), 1562–1577 (2021) https://doi.org/10.1109/ tpami.2019.2957464

work page arXiv 2021

[46] [47]

Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

Li, S., Yeung, D.-Y.: Visual object tracking for unmanned aerial vehicles: A bench- mark and new motion models. Proceedings of the AAAI Conference on Artificial Intelligence31(1) (2017) https://doi.org/10.1609/aaai.v31i1.11205

work page doi:10.1609/aaai.v31i1.11205 2017

[47] [48]

International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

Yu, H., Li, G., Zhang, W., Huang, Q., Du, D., Tian, Q., Sebe, N.: The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International 25 Journal of Computer Vision128(5), 1141–1159 (2019) https://doi.org/10.1007/ s11263-019-01266-1

work page 2019

[48] [49]

In: Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceed- ings, Part I 14, pp

Mueller, M., Smith, N., Ghanem, B.: A Benchmark and Simulator for UAV Tracking, pp. 445–461 (2016). https://doi.org/10.1007/978-3-319-46448-0 27

work page doi:10.1007/978-3-319-46448-0 2016

[49] [50]

Derf: Decomposed radiance fields,

Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8122–8131 (2021) https://doi.org/10.1109/CVPR46437.2021.00803

work page doi:10.1109/cvpr46437.2021.00803 2021

[50] [51]

Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

Li, S., Liu, Y., Zhao, Q., Feng, Z.: Learning residue-aware correlation filters and refining scale for real-time uav tracking. Pattern Recognition127, 108614 (2022) https://doi.org/10.1016/j.patcog.2022.108614

work page doi:10.1016/j.patcog.2022.108614 2022

[51] [52]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: Hift: Hierarchical feature transformer for aerial tracking. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15437–15446 (2021). https://doi.org/10.1109/iccv48922.2021.01517

work page doi:10.1109/iccv48922.2021.01517 2021

[52] [53]

A ConvNet for the 2020s

Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., Fu, C.: Tctrack: Temporal contexts for aerial tracking. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14778–14788 (2022). https://doi.org/10.1109/ cvpr52688.2022.01438

work page arXiv 2022

[53] [54]

IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

Zuo, H., Fu, C., Li, S., Lu, K., Li, Y., Feng, C.: Adversarial blur-deblur network for robust uav tracking. IEEE Robotics and Automation Letters8(2), 1101–1108 (2023) https://doi.org/10.1109/lra.2023.3236584

work page doi:10.1109/lra.2023.3236584 2023

[54] [55]

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Yao, L., Fu, C., Li, S., Zheng, G., Ye, J.: Sgdvit: Saliency-guided dynamic vision transformer for uav tracking. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3353–3359 (2023). https://doi.org/10. 1109/icra48891.2023.10161487

work page arXiv 2023

[55] [56]

Dreher, T

Fu, C., Lei, X., Zuo, H., Yao, L., Zheng, G., Pan, J.: Progressive representation learning for real-time uav tracking. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5072–5079 (2024). https://doi. org/10.1109/iros58592.2024.10803050

work page doi:10.1109/iros58592.2024.10803050 2024

[56] [57]

10610948

Wei, Q., Zeng, B., Liu, J., He, L., Zeng, G.: Litetrack: Layer pruning with asynchronous feature extraction for lightweight and efficient visual tracking. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 4968–4975 (2024). https://doi.org/10.1109/icra57147.2024.10610022

work page doi:10.1109/icra57147.2024.10610022 2024

[57] [58]

In: 2024 IEEE/CVF Winter Conference on Applications of Com- puter Vision (W ACV), pp

Gopal, G.Y., Amer, M.A.: Separable self and mixed attention transformers for efficient object tracking. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6694–6703 (2024). https://doi.org/10.1109/ wacv57701.2024.00657 26

work page arXiv 2024

[58] [59]

DropBlock: A regularization method for convolutional networks

Ghiasi, G., Lin, T.-Y., Le, Q.V.: Dropblock: A regularization method for convo- lutional networks. Advances in neural information processing systems31(2018) https://doi.org/arXiv:1810.12890

work page internal anchor Pith review Pith/arXiv arXiv 2018

[59] [60]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kirillov, A., Mintun, E., al., e.: Segment anything. In: 2023 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pp. 3992–4003 (2023). https: //doi.org/10.1109/iccv51070.2023.00371

work page doi:10.1109/iccv51070.2023.00371 2023

[60] [61]

, author Han, D

Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., Choe, J.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6022–6031 (2019). https://doi.org/10.1109/iccv.2019.00612

work page doi:10.1109/iccv.2019.00612 2019

[61] [62]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Gao, S., Zhou, C., Zhang, J.: Generalized relation modeling for transformer track- ing. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 18686–18695 (2023). https://doi.org/10.1109/cvpr52729.2023. 01792 27

work page doi:10.1109/cvpr52729.2023 2023