AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

Aniket Bera; Celso M. de Melo; Dinesh Manocha; Ruiqi Xian; Stephen M. Nogar; Tianrui Guan; Xijun Wang

arxiv: 2303.01589 · v1 · pith:OYGKL2R3new · submitted 2023-03-02 · 💻 cs.CV · cs.RO

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

Xijun Wang , Ruiqi Xian , Tianrui Guan , Celso M. de Melo , Stephen M. Nogar , Aniket Bera , Dinesh Manocha This is my paper

classification 💻 cs.CV cs.RO

keywords actionapproachdatasetimprovementtemporalaerialautocomputational

0 comments

read the original abstract

We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human dataset and 3.2% improvement on the Drone Action dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

UAV-OVO: Out-of-Viewpoint Generalization in UAV Action Recognition
cs.CV 2026-05 unverdicted novelty 6.0

UAV-OVO benchmark exposes large ID/OOD performance gaps in video action recognition due to low-to-high depression viewpoint shifts, and LATER uses LoRA subspace anchoring for test-time feature re-centering to reduce drift.