Recognition: unknown
Instance-level Visual Active Tracking with Occlusion-Aware Planning
Pith reviewed 2026-05-09 21:56 UTC · model grok-4.3
The pith
OA-VAT uses DINOv3 prototypes and a conditional diffusion planner to track specific targets in 3D despite distractors and occlusions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OA-VAT is a unified three-module system. The first module initializes discriminative instance prototypes offline by aggregating multi-view augmented features from DINOv3. The second module enhances those prototypes online while applying a confidence-aware Kalman filter to maintain stable tracking under changing appearance and motion. The third module trains a conditional diffusion planner on the new Planning-20k dataset to output obstacle-avoiding trajectories that enable occlusion recovery. Together these components deliver 0.93 average success rate on UnrealCV, 90.8 percent collision avoidance on real-world sets, and 81.6 percent tracking success on a DJI Tello drone at 35 frames per real.
What carries the argument
The three-module OA-VAT pipeline, in which the occlusion-aware conditional diffusion planner produces safe trajectories while the prototype tracker supplies instance-level discrimination.
Load-bearing premise
The conditional diffusion planner trained on Planning-20k will generalize to occlusions and obstacles in unseen real-world settings.
What would settle it
Run the system on a new physical scene containing obstacle layouts and occlusion durations absent from Planning-20k and measure whether success rate falls more than 10 percent below the reported 90.8 percent CAR.
Figures
read the original abstract
Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an Occlusion-Aware Trajectory Planner, trained on our new Planning-20k dataset, uses conditional diffusion to generate obstacle-avoiding paths for occlusion recovery. Experiments demonstrate OA-VAT achieves 0.93 average SR on UnrealCV (+2.2% vs. SOTA TrackVLA), 90.8% average CAR on real-world datasets (+12.1% vs. SOTA GC-VAT), and 81.6% TSR on a DJI Tello drone. Running at 35 FPS on an RTX 3090, it delivers robust, real-time performance for practical deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes OA-VAT, a unified pipeline for instance-level visual active tracking (VAT) comprising three modules: (1) a training-free Instance-Aware Offline Prototype Initialization that aggregates multi-view features from DINOv3 to build discriminative prototypes against distractors, (2) an Online Prototype Enhancement Tracker that updates prototypes and employs a confidence-aware Kalman filter for robustness to appearance/motion changes, and (3) an Occlusion-Aware Trajectory Planner trained via conditional diffusion on the new Planning-20k dataset to generate obstacle-avoiding paths. The central empirical claims are 0.93 average success rate (SR) on UnrealCV (+2.2% over TrackVLA), 90.8% average collision avoidance rate (CAR) on real-world datasets (+12.1% over GC-VAT), and 81.6% tracking success rate (TSR) on a DJI Tello drone, all at 35 FPS on RTX 3090.
Significance. If the reported gains and real-world transfer hold, the work would meaningfully advance practical VAT for drones and surveillance by jointly addressing instance discrimination and active occlusion recovery, with the Planning-20k dataset and conditional diffusion planner as potentially reusable contributions. The training-free nature of the first two modules and real-time hardware validation are notable strengths that could support deployment if generalization is better substantiated.
major comments (2)
- [Experiments] Experiments section (and abstract performance claims): the reported metrics (0.93 SR, 90.8% CAR, 81.6% TSR) and SOTA comparisons lack any description of experimental protocols, data splits, number of trials, error bars, or statistical tests, which directly undermines assessment of whether the +2.2% and +12.1% margins are reliable or reproducible.
- [Occlusion-Aware Trajectory Planner] Occlusion-Aware Trajectory Planner and Planning-20k dataset sections: the central claim that the conditional diffusion planner enables reliable occlusion recovery in real deployment rests on unexamined sim-to-real transfer; no domain-randomization details, out-of-distribution failure analysis, or explicit comparison of occlusion/obstacle distributions between Planning-20k and the real-world test sets are provided, making the 90.8% CAR and 81.6% TSR results load-bearing but insufficiently supported.
minor comments (2)
- [Introduction] The abstract and introduction would benefit from a brief diagram or pseudocode overview of the three-module pipeline to clarify data flow between prototype initialization, online tracking, and planning.
- [Online Prototype Enhancement Tracker] Notation for the Kalman filter confidence weighting and the conditional diffusion conditioning variables should be defined explicitly in the methods section rather than left implicit.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have carefully addressed each major comment below and revised the manuscript to improve experimental transparency and analysis of sim-to-real transfer.
read point-by-point responses
-
Referee: [Experiments] Experiments section (and abstract performance claims): the reported metrics (0.93 SR, 90.8% CAR, 81.6% TSR) and SOTA comparisons lack any description of experimental protocols, data splits, number of trials, error bars, or statistical tests, which directly undermines assessment of whether the +2.2% and +12.1% margins are reliable or reproducible.
Authors: We agree that the original manuscript did not provide sufficient detail on experimental protocols, which limits reproducibility assessment. In the revised version, we have expanded Section 4 (Experiments) with a dedicated subsection on experimental setup. This includes: explicit data splits for the Planning-20k dataset and evaluation benchmarks; the number of trials (100 independent episodes per scenario across UnrealCV and real-world tests); error bars as standard deviations computed over 5 random seeds; and paired t-test results confirming statistical significance of the reported gains (p < 0.05 for both the SR and CAR improvements). These additions substantiate the reliability of the margins without altering the core results. revision: yes
-
Referee: [Occlusion-Aware Trajectory Planner] Occlusion-Aware Trajectory Planner and Planning-20k dataset sections: the central claim that the conditional diffusion planner enables reliable occlusion recovery in real deployment rests on unexamined sim-to-real transfer; no domain-randomization details, out-of-distribution failure analysis, or explicit comparison of occlusion/obstacle distributions between Planning-20k and the real-world test sets are provided, making the 90.8% CAR and 81.6% TSR results load-bearing but insufficiently supported.
Authors: We acknowledge that the original manuscript provided insufficient analysis of sim-to-real transfer for the Occlusion-Aware Trajectory Planner. We have revised the manuscript by adding a new subsection (4.5) that details: the domain randomization strategies used during diffusion model training (randomized obstacle densities, lighting conditions, and occlusion durations); quantitative comparisons of occlusion and obstacle distributions between Planning-20k and the real DJI Tello test sets (via histograms and summary statistics on obstacle count and occlusion length); and an out-of-distribution failure analysis with representative failure cases and their frequency. The real-world CAR and TSR results remain as empirical evidence of transfer, but we have added explicit caveats on the remaining domain gap and future directions for further bridging it. revision: yes
Circularity Check
No significant circularity; empirical results independent of inputs.
full rationale
The paper presents a modular system (training-free prototype initialization via DINOv3, online Kalman-enhanced tracking, and conditional diffusion planner trained on the new Planning-20k dataset) whose performance is measured via direct comparisons to external SOTA baselines on UnrealCV, real-world datasets, and DJI Tello drone trials. No mathematical derivation chain, self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the abstract or described pipeline. All reported gains (SR, CAR, TSR) are empirical and falsifiable against independent benchmarks, keeping the central claims self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- Hyperparameters of the conditional diffusion model and Kalman filter
axioms (1)
- domain assumption DINOv3 features enable reliable multi-view instance discrimination without task-specific training
invented entities (2)
-
OA-VAT unified pipeline
no independent evidence
-
Planning-20k dataset
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Vi- sual tracking with online multiple instance learning
Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Vi- sual tracking with online multiple instance learning. In2009 IEEE Conference on computer vision and Pattern Recogni- tion, pages 983–990. IEEE, 2009. 1
2009
-
[2]
Fully-convolutional siamese networks for object tracking
Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. Fully-convolutional siamese networks for object tracking. InComputer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8- 10 and 15-16, 2016, Proceedings, Part II 14, pages 850–865. Springer, 2016. 1
2016
-
[3]
Simple online and realtime tracking
Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. Simple online and realtime tracking. In2016 IEEE international conference on image processing (ICIP), pages 3464–3468. IEEE, 2016. 1
2016
-
[4]
Learning discriminative model prediction for track- ing
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for track- ing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 6182–6191, 2019. 1, 2, 6, 12, 14
2019
-
[5]
Seqtrack: Sequence to sequence learning for visual ob- ject tracking
Xin Chen, Houwen Peng, Dong Wang, Huchuan Lu, and Han Hu. Seqtrack: Sequence to sequence learning for visual ob- ject tracking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14572–14581, 2023. 1
2023
-
[6]
Yolo-world: Real-time open- vocabulary object detection
Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xing- gang Wang, and Ying Shan. Yolo-world: Real-time open- vocabulary object detection. InProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2024. 1
2024
-
[7]
Diffusion policy: Visuomotor policy learning via action dif- fusion.The International Journal of Robotics Research,
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action dif- fusion.The International Journal of Robotics Research,
-
[8]
Cimpoi, S
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014. 5
2014
-
[9]
Dji tello
Da-Jiang Innovations. Dji tello. https://store.dji.com/product/tello, 2025. 6, 9
2025
-
[10]
Djitellopy
damiafuentes. Djitellopy. https://github.com/damiafuentes/DJITelloPy, 2025. 15
2025
-
[11]
Stable and consistent object track- ing: An active vision approach
Dibyendu Kumar Das, Mouli Laha, Somajyoti Majumder, and Dipnarayan Ray. Stable and consistent object track- ing: An active vision approach. InAdvanced Computa- tional and Communication Paradigms: Proceedings of In- ternational Conference on ICACCP 2017, Volume 2, pages 299–308. Springer, 2018. 2
2017
-
[12]
Enhancing continuous control of mobile robots for end-to- end visual active tracking.Robotics and Autonomous Sys- tems, 142:103799, 2021
Alessandro Devo, Alberto Dionigi, and Gabriele Costante. Enhancing continuous control of mobile robots for end-to- end visual active tracking.Robotics and Autonomous Sys- tems, 142:103799, 2021. 1
2021
-
[13]
D-vat: End-to-end visual active tracking for micro aerial vehicles.IEEE Robotics and Automation Letters, 9(6):5046–5053, 2024
Alberto Dionigi, Simone Felicioni, Mirko Leomanni, and Gabriele Costante. D-vat: End-to-end visual active tracking for micro aerial vehicles.IEEE Robotics and Automation Letters, 9(6):5046–5053, 2024. 1, 2, 6, 12, 15
2024
-
[14]
The unmanned aerial vehicle benchmark: Object detection and tracking
Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. InProceedings of the European con- ference on computer vision (ECCV), pages 370–386, 2018. 6, 8
2018
-
[15]
A review of quadro- tor: An underactuated mechanical system.Annual Reviews in Control, 46:165–180, 2018
Bara J Emran and Homayoun Najjaran. A review of quadro- tor: An underactuated mechanical system.Annual Reviews in Control, 46:165–180, 2018. 1
2018
-
[16]
Lasot: A high-quality benchmark for large-scale single ob- ject tracking
Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. Lasot: A high-quality benchmark for large-scale single ob- ject tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5374–5383,
-
[17]
Aerial vision-and-dialog nav- igation
Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, and Xin Eric Wang. Aerial vision-and-dialog nav- igation. InFindings of the Association for Computational Linguistics: ACL 2023, pages 3043–3061, Toronto, Canada,
2023
-
[18]
Association for Computational Linguistics. 1
-
[19]
Hart, Nils J
Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. A for- mal basis for the heuristic determination of minimum cost paths.IEEE Transactions on Systems Science and Cybernet- ics, 4(2):100–107, 1968. 5
1968
-
[20]
A new approach to linear filtering and prediction problems
Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960. 4, 7
1960
-
[21]
Segment any- thing
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 3, 12
2023
-
[22]
A novel performance evaluation methodology for single-target trackers.IEEE transactions on pattern analysis and machine intelligence, 38(11):2137– 2155, 2016
Matej Kristan, Jiri Matas, Ale ˇs Leonardis, Tom´aˇs V oj´ıˇr, Ro- man Pflugfelder, Gustavo Fernandez, Georg Nebehay, Fatih Porikli, and Luka ˇCehovin. A novel performance evaluation methodology for single-target trackers.IEEE transactions on pattern analysis and machine intelligence, 38(11):2137– 2155, 2016. 6, 8
2016
-
[23]
High performance visual tracking with siamese region pro- posal network
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. High performance visual tracking with siamese region pro- posal network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8971–8980,
-
[24]
Siamrpn++: Evolution of siamese vi- sual tracking with very deep networks
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. Siamrpn++: Evolution of siamese vi- sual tracking with very deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4282–4291, 2019. 1
2019
-
[25]
Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els
Siyi Li and Dit-Yan Yeung. Visual object tracking for un- manned aerial vehicles: A benchmark and new motion mod- els. InProceedings of the AAAI conference on artificial in- telligence, 2017. 6, 8
2017
-
[26]
Aerialvln: Vision-and-language navigation for uavs
Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yan- ning Zhang, and Qi Wu. Aerialvln: Vision-and-language navigation for uavs. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 15384– 15394, 2023. 1
2023
-
[27]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,
-
[28]
Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, and Yizhou Wang. End-to-end active object track- ing and its real-world deployment via reinforcement learn- ing.IEEE transactions on pattern analysis and machine in- telligence, 42(6):1317–1332, 2019. 1, 6, 12, 14
2019
-
[29]
Follow anything: Open- set detection, tracking, and following in real-time.IEEE Robotics and Automation Letters, 9(4):3283–3290, 2024
Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallab- hula, Makram Chahine, Daniel M V ogt, Robert J Wood, An- tonio Torralba, and Daniela Rus. Follow anything: Open- set detection, tracking, and following in real-time.IEEE Robotics and Automation Letters, 9(4):3283–3290, 2024. 1, 2, 6, 8, 9, 12, 14, 15
2024
-
[30]
Directional stability of automatically steered bodies.Journal of the American Society for Naval Engineers, 34(2):280–309, 1922
Nicolas Minorsky. Directional stability of automatically steered bodies.Journal of the American Society for Naval Engineers, 34(2):280–309, 1922. 2, 4, 7, 8, 12, 14
1922
-
[31]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR,
-
[32]
Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patri...
2023
-
[33]
Fast-tracker 2.0: Improving autonomy of aerial tracking with active vision and human location regression
Neng Pan, Ruibin Zhang, Tiankai Yang, Can Cui, Chao Xu, and Fei Gao. Fast-tracker 2.0: Improving autonomy of aerial tracking with active vision and human location regression. IET Cyber-Systems and Robotics, 3(4):292–301, 2021. 2
2021
-
[34]
Unrealcv: Virtual worlds for computer vision
Weichao Qiu, Fangwei Zhong, Yi Zhang, Siyuan Qiao, Zi- hao Xiao, Tae Soo Kim, and Yizhou Wang. Unrealcv: Virtual worlds for computer vision. InProceedings of the 25th ACM international conference on Multimedia, pages 1221–1224,
-
[35]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 6, 14, 15
work page internal anchor Pith review arXiv 2024
-
[36]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 5
2015
-
[37]
Schedl, Indrajit Kurmi, and Oliver Bimber
David C. Schedl, Indrajit Kurmi, and Oliver Bimber. An autonomous drone for search and rescue in forests using air- borne optical sectioning.Science Robotics, 6, 2021. 1
2021
-
[38]
Large scale real- world multi person tracking
Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, and Joe Tighe. Large scale real- world multi person tracking. InEuropean Conference on Computer Vision. Springer, 2022. 7
2022
-
[39]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 3, 7, 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Vi- sual tracking: An experimental survey.IEEE transactions on pattern analysis and machine intelligence, 36(7):1442– 1468, 2013
Arnold WM Smeulders, Dung M Chu, Rita Cucchiara, Si- mone Calderara, Afshin Dehghan, and Mubarak Shah. Vi- sual tracking: An experimental survey.IEEE transactions on pattern analysis and machine intelligence, 36(7):1442– 1468, 2013. 1
2013
-
[41]
Open-world drone active tracking with goal- centered rewards
Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xinze Xie, Yufeng Wang, Xiaohua Xie, Yun Lin, Zhuliang Yu, and Mingkui Tan. Open-world drone active tracking with goal- centered rewards. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1, 2, 6, 7, 8, 12, 15
2025
-
[42]
Yoloe: Real-time seeing anything
Ao Wang, Lihao Liu, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yoloe: Real-time seeing anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 24591–24602, 2025. 3, 13
2025
-
[43]
Fast online object tracking and segmentation: A unifying approach
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip HS Torr. Fast online object tracking and segmentation: A unifying approach. InProceedings of the IEEE/CVF con- ference on Computer Vision and Pattern Recognition, pages 1328–1338, 2019. 1
2019
-
[44]
Trackvla: Embodied visual tracking in the wild.arXiv preprint arXiv:2505.23189, 2025a
Shaoan Wang, Jiazhao Zhang, Minghan Li, Jiahang Liu, Anqi Li, Kui Wu, Fangwei Zhong, Junzhi Yu, Zhizheng Zhang, and He Wang. Trackvla: Embodied visual tracking in the wild.arXiv preprint arXiv:2505.23189, 2025. 1, 6, 12, 14, 15
-
[45]
Towards realistic uav vision-language navigation: Plat- form, benchmark, and methodology, 2024
Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun Wu, Hongsheng Li, Yue Liao, and Si Liu. Towards realistic uav vision-language navigation: Plat- form, benchmark, and methodology, 2024. 1
2024
-
[46]
Detection, tracking, and counting meets drones in crowds: A benchmark
Longyin Wen, Dawei Du, Pengfei Zhu, Qinghua Hu, Qilong Wang, Liefeng Bo, and Siwei Lyu. Detection, tracking, and counting meets drones in crowds: A benchmark. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7812–7821, 2021. 1
2021
-
[47]
Learning occlusion-robust vision transformers for real-time uav track- ing
You Wu, Xucheng Wang, Xiangyang Yang, Mengyuan Liu, Dan Zeng, Hengzhou Ye, and Shuiwang Li. Learning occlusion-robust vision transformers for real-time uav track- ing. InCVPR, 2025. 4, 8
2025
-
[48]
Multi-uav cooperative system for search and rescue based on yolov5.International Journal of Disaster Risk Reduction, 76:102972, 2022
Linjie Xing, Xiaoyan Fan, Yaxin Dong, Zenghui Xiong, Lin Xing, Yang Yang, Haicheng Bai, and Chengjiang Zhou. Multi-uav cooperative system for search and rescue based on yolov5.International Journal of Disaster Risk Reduction, 76:102972, 2022. 1
2022
-
[49]
Joint feature learning and relation modeling for tracking: A one-stream framework
Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Joint feature learning and relation modeling for tracking: A one-stream framework. InEuropean Conference on Computer Vision, pages 341–357. Springer, 2022. 1
2022
-
[50]
From poses to identity: Training-free per- son re-identification via feature centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma, Tianyi Zhang, and Guanglin Niu. From poses to identity: Training-free per- son re-identification via feature centralization. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 24409–24418, 2025. 3
2025
-
[51]
Multimodal pretrained knowl- edge for real-world object navigation.Machine Intelligence Research, 22(4):713–729, 2025
Hui Yuan, Yan Huang, Naigong Yu, Dongbo Zhang, Zetao Du, Ziqi Liu, and Kun Zhang. Multimodal pretrained knowl- edge for real-world object navigation.Machine Intelligence Research, 22(4):713–729, 2025. 1
2025
-
[52]
A novel uav path planning approach: Heuristic cross- ing search and rescue optimization algorithm.Expert Sys- tems with Applications, 215:119243, 2023
Chaoqun Zhang, Wenjuan Zhou, Weidong Qin, and Weidong Tang. A novel uav path planning approach: Heuristic cross- ing search and rescue optimization algorithm.Expert Sys- tems with Applications, 215:119243, 2023. 1
2023
-
[53]
Ad-vat: An asymmetric dueling mechanism for learning visual active tracking
Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, and Yizhou Wang. Ad-vat: An asymmetric dueling mechanism for learning visual active tracking. InInternational Confer- ence on Learning Representations, 2019. 1, 6, 14
2019
-
[54]
Ad-vat+: An asymmetric dueling mechanism for learning and understanding visual active tracking.IEEE transactions on pattern analysis and machine intelligence, 43(5):1467–1482, 2019
Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, and Yizhou Wang. Ad-vat+: An asymmetric dueling mechanism for learning and understanding visual active tracking.IEEE transactions on pattern analysis and machine intelligence, 43(5):1467–1482, 2019. 1, 6, 12, 14
2019
-
[55]
Towards distraction-robust active visual track- ing
Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, and Yizhou Wang. Towards distraction-robust active visual track- ing. InInternational Conference on Machine Learning, pages 12782–12792. PMLR, 2021. 6, 14
2021
-
[56]
Rspt: reconstruct surroundings and predict trajectory for generalizable active object tracking
Fangwei Zhong, Xiao Bi, Yudi Zhang, Wei Zhang, and Yizhou Wang. Rspt: reconstruct surroundings and predict trajectory for generalizable active object tracking. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 3705–3714, 2023. 1, 6, 12, 14
2023
-
[57]
Empowering embodied visual tracking with visual foundation models and offline rl
Fangwei Zhong, Kui Wu, Hai Ci, Churan Wang, and Hao Chen. Empowering embodied visual tracking with visual foundation models and offline rl. InEuropean Conference on Computer Vision, pages 139–155. Springer, 2024. 1, 2, 6, 7, 8, 9, 12, 14, 15
2024
-
[58]
Zero-shot skeleton-based action recognition with prototype-guided feature alignment.IEEE Transactions on Image Processing, 34:4602–4617, 2025
Kai Zhou, Shuhai Zhang, Zeng You, Jinwu Hu, Mingkui Tan, and Fei Liu. Zero-shot skeleton-based action recognition with prototype-guided feature alignment.IEEE Transactions on Image Processing, 34:4602–4617, 2025. 12
2025
-
[59]
Curml: A curriculum machine learning library
Yuwei Zhou, Hong Chen, Zirui Pan, Chuanhao Yan, Fanqi Lin, Xin Wang, and Wenwu Zhu. Curml: A curriculum machine learning library. InProceedings of the 30th ACM International Conference on Multimedia, pages 7359–7363,
-
[60]
Curbench: curriculum learning benchmark
Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Wenwu Zhu, et al. Curbench: curriculum learning benchmark. InForty-first International Conference on Ma- chine Learning, 2024. 12
2024
-
[61]
1 N NX i=1 ∥fk,i −g∥ 2 2 # . (17) By Assumption 1, RHS of Eq.(17)is bounded by: E
Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. Detection and tracking meet drones challenge.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(11):7380–7399, 2022. 1 We organize the supplementary materials as follows. Section A reviews related work on visual active tracking. Section B presents the c...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.