Multi-Cue Vehicle Detection for Semantic Video Compression In Georegistered Aerial Videos

Filiz Bunyak; Guna Seetharaman; Hadi Aliakbarpour; Kannappan Palaniappan; Noor Al-Shakarji

arxiv: 1907.01176 · v1 · pith:ZUYFG32Fnew · submitted 2019-07-02 · 💻 cs.CV

Multi-Cue Vehicle Detection for Semantic Video Compression In Georegistered Aerial Videos

Noor Al-Shakarji , Filiz Bunyak , Hadi Aliakbarpour , Guna Seetharaman , Kannappan Palaniappan This is my paper

Pith reviewed 2026-05-25 11:28 UTC · model grok-4.3

classification 💻 cs.CV

keywords moving vehicle detectionaerial videosemantic compressionmulti-cue fusiondeep learningflux tensorUAV video analyticsgeoregistered video

0 comments

The pith

Fusing deep learning appearance detections with flux tensor motion filtering identifies moving vehicles in aerial video and enables semantic compression ratios above 100:1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-cue pipeline that combines deep learning for vehicle appearance with flux tensor spatio-temporal filtering for motion to detect moving vehicles from airborne cameras. This approach filters false positives such as parked vehicles by requiring both cues to align, addressing challenges like small object sizes, camera jitter, and scene complexity. The detected moving vehicles supply region-of-interest information that supports semantic video compression achieving ratios over 100:1 while retaining high image fidelity. Such compression improves use of limited-bandwidth air-to-ground links in UAV networks by transmitting only the relevant content.

Core claim

The proposed multi-cue pipeline synergistically fuses deep learning appearance detections and flux tensor spatio-temporal filtering to detect moving vehicles with high precision and recall while filtering out false positives such as parked vehicles, and experimental results show that incorporating contextual information of moving vehicles enables high semantic compression ratios of over 100:1 with high image fidelity.

What carries the argument

The synergistic fusion of deep learning appearance detections and flux tensor motion detections, which requires agreement between cues to suppress false positives from parked vehicles.

If this is right

Moving vehicles are detected with high precision and recall in georegistered aerial videos.
False positives such as parked vehicles are filtered through intelligent cue fusion.
Semantic compression ratios exceed 100:1 while preserving high image fidelity.
Limited bandwidth air-to-ground network links are utilized more efficiently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion logic could be tested on other small moving objects such as pedestrians in the same aerial setting.
Georegistration data already present in the videos could be combined with the detections to produce geographically tagged vehicle tracks.
Onboard implementation of the pipeline would allow real-time selection of regions before transmission rather than post-capture compression.

Load-bearing premise

The fusion of appearance and motion cues will reliably suppress false positives from parked vehicles and maintain performance across unstated variations in platform motion, camera jitter, obscurations, and degraded imaging conditions.

What would settle it

Running the detection pipeline on aerial video sequences that contain many parked vehicles together with camera jitter or low-contrast conditions and measuring whether false positive rates remain low and compression ratios stay above 100:1.

Figures

Figures reproduced from arXiv: 1907.01176 by Filiz Bunyak, Guna Seetharaman, Hadi Aliakbarpour, Kannappan Palaniappan, Noor Al-Shakarji.

**Figure 1.** Figure 1: Multi-cue moving vehicle detection pipeline using motion, appearance and shape information from detections at [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A scene and its dominant ground plane π is observed by an airborne camera while hovering over a scene and passing through n way-points. Each image frame is projected using homography onto the scene dominant plane, π. The homographic transformation of the images of a 3D point like X1, which lies on plane π, all converge to an identical 2D point in π and are coincident to X1. Whereas, for an off-plane 3D p… view at source ↗

**Figure 3.** Figure 3: Loss and average loss for appearance training [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Building roof-top detection using flux-based motion parallax response. (a) Building parallax response, obtained [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Intermediate results and the final result after applying the pipeline. a) Raw data, b) Motion mask overlaid on flux [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Semantic compression at the source, onboard an aerial platform, using object detection and embedded processing. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Detection of moving objects such as vehicles in videos acquired from an airborne camera is very useful for video analytics applications. Using fast low power algorithms for onboard moving object detection would also provide region of interest-based semantic information for scene content aware image compression. This would enable more efficient and flexible communication link utilization in lowbandwidth airborne cloud computing networks. Despite recent advances in both UAV or drone platforms and imaging sensor technologies, vehicle detection from aerial video remains challenging due to small object sizes, platform motion and camera jitter, obscurations, scene complexity and degraded imaging conditions. This paper proposes an efficient moving vehicle detection pipeline which synergistically fuses both appearance and motion-based detections in a complementary manner using deep learning combined with flux tensor spatio-temporal filtering. Our proposed multi-cue pipeline is able to detect moving vehicles with high precision and recall, while filtering out false positives such as parked vehicles, through intelligent fusion. Experimental results show that incorporating contextual information of moving vehicles enables high semantic compression ratios of over 100:1 with high image fidelity, for better utilization of limited bandwidth air-to-ground network links.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Practical pipeline fusing deep learning and flux tensor for moving vehicle detection in aerial video to support semantic compression, but the abstract supplies no numbers or baselines to support the performance claims.

read the letter

The paper's main contribution is a multi-cue detection pipeline that combines appearance-based deep learning detections with flux tensor motion filtering, then uses the results for region-of-interest semantic compression in georegistered aerial video. The goal is to cut bandwidth on air-to-ground links while keeping moving vehicles and dropping parked ones. That is a reasonable engineering target for UAV analytics under tight network constraints. The fusion logic to suppress false positives from stationary vehicles makes sense on paper and fits the problem setup of platform motion, jitter, and small objects. The authors cite the relevant challenges and position the work as an application-driven combination of existing techniques rather than a new theoretical framework. That part is straightforward and honest. The central weakness is that the abstract asserts high precision and recall plus compression ratios over 100:1 with high fidelity, yet gives no datasets, no baseline comparisons, no ablation results, and no error bars. Without those numbers it is impossible to tell whether the fusion actually delivers or whether the claims rest on unshown experiments. If the full manuscript contains reproducible tables and code, that would change the picture; on the text provided the performance side stays unevaluated. This work is aimed at researchers and engineers handling aerial video transmission and onboard analytics rather than the broader computer vision community. A reader already working on UAV compression or moving-object detection in constrained settings could extract useful implementation details. It is not a foundational paper, but the application focus is clear enough that a serious editor should send it for peer review so the experimental claims can be checked against actual data and code.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a multi-cue pipeline for detecting moving vehicles in georegistered aerial videos by synergistically fusing deep-learning appearance detections with flux-tensor motion detections. The approach is intended to suppress false positives such as parked vehicles and to supply region-of-interest information for semantic video compression, with the abstract claiming high precision/recall and compression ratios exceeding 100:1.

Significance. If the performance claims hold under realistic platform motion, jitter, and imaging conditions, the work could improve bandwidth efficiency for air-to-ground links in UAV networks. The absence of any quantitative metrics, datasets, baselines, or ablation results in the supplied text, however, prevents assessment of whether those gains are actually realized.

major comments (1)

Abstract: the central claims of 'high precision and recall' together with 'compression ratios of over 100:1' are asserted without any supporting numbers, datasets, baselines, error bars, or ablation studies. Because these performance figures are the sole justification for the pipeline and its compression application, the manuscript cannot be evaluated on its primary contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to address the concerns. We respond to the major comment below.

read point-by-point responses

Referee: [—] Abstract: the central claims of 'high precision and recall' together with 'compression ratios of over 100:1' are asserted without any supporting numbers, datasets, baselines, error bars, or ablation studies. Because these performance figures are the sole justification for the pipeline and its compression application, the manuscript cannot be evaluated on its primary contribution.

Authors: We agree that the abstract asserts strong performance claims without accompanying quantitative details, which prevents full evaluation of the contribution. The manuscript text references experimental results on the multi-cue fusion but does not include the specific supporting numbers, dataset descriptions, baseline comparisons, error bars, or ablation studies in the version provided to the referee. We will revise the manuscript to add these elements to the experimental section (including precision/recall values, the datasets and imaging conditions used, comparisons to appearance-only and motion-only baselines, and ablation results on the fusion strategy) and will update the abstract to reference the quantitative findings more precisely. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical multi-cue detection pipeline that fuses appearance-based deep learning detections with flux-tensor motion filtering to identify moving vehicles and enable semantic compression. No derivation chain, fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations are present in the abstract or described methods. The central claims rest on experimental results rather than any mathematical reduction to inputs by construction, making the work self-contained as an engineering approach without circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces no explicit free parameters, axioms, or invented entities; relies on standard assumptions of computer vision pipelines such as the utility of appearance and motion cues.

pith-pipeline@v0.9.0 · 5739 in / 1191 out tokens · 64953 ms · 2026-05-25T11:28:31.425014+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 3 internal anchors

[1]

http://www.transparentsky.net

ABQ video. http://www.transparentsky.net. 2

work page
[2]

A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Ro- bust real-time unusual event detection using multiple ﬁxed- location monitors. IEEE Trans. on Pattern Analysis and Ma- chine Intelligence, 30(3):555–560, 2008. 7

work page 2008
[3]

Agarwal, N

S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, and R. Szeliski. Building Rome in a day. In IEEE Int. Conf. on Computer Vision (ICCV), pages 72–79, 2009. 2

work page 2009
[4]

Al-Shakarji, F

N.M. Al-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Robust multi-object tracking with semantic color correlation. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) , pages 1–7,

work page
[5]

Al-Shakarji, F

N.M. Al-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Multi-object tracking cascade with multi-step data association and occlusion handling. In IEEE Conf. on Ad- vanced Video and Signal Based Surveillance (AVSS) , pages 1–6, 2018. 7, 8

work page 2018
[6]

N. M. AL-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Robust multi-object tracking for wide area motion imagery. IEEE Conf. on Applied Imagery Pattern Recogni- tion Workshop (AIPR), pages 1–5, 2018. 7

work page 2018
[7]

AliAkbarpour, K

H. AliAkbarpour, K. Palaniappan, and G. Seetharaman. Parallax-tolerant aerial image georegistration and efﬁ- cient camera pose reﬁnementwithout piecewise homogra- phies. IEEE Trans. on Geoscience and Remote Sensing , 55(8):4618–4637, 2017. 2

work page 2017
[8]

Basharat et al

A. Basharat et al. Real-time multi-target tracking at 210 megapixels/second in wide area motion imagery. IEEE Workshop on Applications of Computer Vision (WACV) , pages 839–846, 2014. 1

work page 2014
[9]

Bunyak, K

F. Bunyak, K. Palaniappan, S.K. Nath, and G. Seetharaman. Flux tensor constrained geodesic active contours with sensor fusion for persistent object tracking. Journal of Multimedia, 2(4):20, 2007. 4

work page 2007
[10]

Bunyak, K

F. Bunyak, K. Palaniappan, S. K. Nath, and G. Seethara- man. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In IEEE Work- shop on Applications of Computer Vision (WACV), pages 35– 35, 2007. 4

work page 2007
[11]

Chavez-Garcia and O

R.O. Chavez-Garcia and O. Aycard. Multiple sensor fu- sion and classiﬁcation for moving object detection and track- ing. IEEE Trans. on Intelligent Transportation Systems , 17(2):525–534, 2016. 1

work page 2016
[12]

Ekin, A.M

A. Ekin, A.M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7):796–807, 2003. 7

work page 2003
[13]

Farmer, X

M.E. Farmer, X. Lu, H. Chen, and A.K. Jain. Robust motion- based image segmentation using fusion. IEEE Int. Conf. on Image Processing, 5:3375–3378, 2004. 1

work page 2004
[14]

Gautama and M.A

T. Gautama and M.A. Van Hulle. A phase-based approach to the estimation of the optical ﬂow ﬁeld using spatial ﬁltering. IEEE Trans. on Neural Networks, 13(5):1127–1136, 2002. 1

work page 2002
[15]

Girshick

R. Girshick. Fast R-CNN. In IEEE Int. Conf. on Computer Vision (ICCV), pages 1440–1448, 2015. 4

work page 2015
[16]

Region-based convolutional networks for accurate object detection and segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2016. 4

work page 2016
[17]

Hartley and A

R.I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. 2003. 3

work page 2003
[18]

B. Heo, K. Yun, and J.Y . Choi. Appearance and motion based deep learning architecture for moving object detection in moving camera. In IEEE Int. Conf. on Image Processing (ICIP), pages 1827–1831, 2017. 1

work page 2017
[19]

M. R. James, S. Robson, et al. Optimising UA V topographic surveys processed with structure-from-motion: Ground con- trol quality, quantity and bundle adjustment. Geomorphol- ogy, 280:51–66, 2017. 2

work page 2017
[20]

D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y . Bulatov, and B. McCord. xView: Objects in con- text in overhead imagery. arXiv:1802.07856, 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Y .J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 1346–1353, 2012. 7

work page 2012
[22]

Linger and A.A

M.E. Linger and A.A. Goshtasby. Aerial image registration for tracking. IEEE Transactions on Geoscience and Remote Sensing, 53(4):2137–2145, 2015. 2

work page 2015
[23]

Liu et al

W. Liu et al. SSD: Single shot multibox detector. In Eu- ropean Conference on Computer Vision (ECCV) , volume LNCS 9905, pages 21–37, 2016. 4

work page 2016
[24]

Lyu et al

S. Lyu et al. UA-DETRAC 2017: Report of A VSS2017 & IWT4S challenge on advanced trafﬁc monitoring. In IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pages 1–7, 2017. 1

work page 2017
[25]

Nagel and A

H.H. Nagel and A. Gehrke. Spatiotemporally adaptive esti- mation and segmentation of OF-Fields. In European Con- ference on Computer Vision (ECCV) , volume LNCS 1407, pages 86–102, 1998. 3

work page 1998
[26]

Naphade et al

M. Naphade et al. The 2018 NVIDIA AI city challenge. In IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pages 53–60, 2017. 1

work page 2018
[27]

Nath and K

S. Nath and K. Palaniappan. Adaptive robust structure ten- sors for orientation estimation and image segmentation. In LNCS-3804: Proc. ISVC’05, pages 445–453, 2005. 3, 4

work page 2005
[28]

Palaniappan, I

K. Palaniappan, I. Ersoy, and S.K. Nath. Moving object segmentation using the ﬂux tensor for biological video mi- croscopy. In Paciﬁc-Rim Conference on Multimedia, pages 483–493, 2007. 2, 3

work page 2007
[29]

Palaniappan, R

K. Palaniappan, R. Rao, and G. Seetharaman. Wide-area persistent airborne video: Architecture and challenges. In B. Banhu et al., editors, Distributed Video Sensor Networks: Research Challenges and Future Directions , chapter 24, pages 349–371. Springer, 2011. 2

work page 2011
[30]

Razakarivony and F

S. Razakarivony and F. Jurie. Vehicle detection in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation , 34:187– 203, 2016. 4

work page 2016
[31]

Redmon, S

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In IEEE Conf. Computer vision and Pattern Recognition, pages 779– 788, 2016. 4

work page 2016
[32]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. YOLOv3: An incremental im- provement. arXiv preprint arXiv:1804.02767, 2018. 1, 4

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

Schneider, C

J. Schneider, C. Eling, L. Klingbeil, H. Kuhlmann, W. Frst- ner, and C. Stachniss. Fast and effective online pose estima- tion and mapping for UA Vs. InIEEE Int. Conf. on Robotics and Automation (ICRA), pages 4784–4791, 2016. 2

work page 2016
[34]

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

M.J. Shaﬁee, B. Chywl, F. Li, and A. Wong. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv:1709.05943, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

M. Siam, H. Mahgoub, M. Zahran, S. Yogamani, M. Jager- sand, and A. El-Sallab. MODNET: Moving object detection network with motion and appearance for autonomous driv- ing. Int. Conf. Intelligent Transportation Systems, 2017. 1

work page 2017
[36]

Van De Weijer, T

J. Van De Weijer, T. Gevers, and A.W.M. Smeulders. Robust photometric invariant features from the color tensor. IEEE Trans. on Image Processing, 15(1):118–127, 2006. 4

work page 2006
[37]

R. Wang, F. Bunyak, G. Seetharaman, and K. Palaniappan. Static and moving object detection using ﬂux tensor with split gaussian models. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pages 414–418, 2014. 1

work page 2014
[38]

C. Yuan, G. Medioni, J. Kang, and I. Cohen. Detecting motion regions in the presence of a strong parallax from a moving camera by multiview geometric constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1627–1641, 2007. 2

work page 2007
[39]

Zhu et al

P. Zhu et al. VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results. In European Conference on Computer Vision (ECCV) , volume LNCS 11133, pages 496–518, 2019. 1

work page 2019

[1] [1]

http://www.transparentsky.net

ABQ video. http://www.transparentsky.net. 2

work page

[2] [2]

A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Ro- bust real-time unusual event detection using multiple ﬁxed- location monitors. IEEE Trans. on Pattern Analysis and Ma- chine Intelligence, 30(3):555–560, 2008. 7

work page 2008

[3] [3]

Agarwal, N

S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, and R. Szeliski. Building Rome in a day. In IEEE Int. Conf. on Computer Vision (ICCV), pages 72–79, 2009. 2

work page 2009

[4] [4]

Al-Shakarji, F

N.M. Al-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Robust multi-object tracking with semantic color correlation. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) , pages 1–7,

work page

[5] [5]

Al-Shakarji, F

N.M. Al-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Multi-object tracking cascade with multi-step data association and occlusion handling. In IEEE Conf. on Ad- vanced Video and Signal Based Surveillance (AVSS) , pages 1–6, 2018. 7, 8

work page 2018

[6] [6]

N. M. AL-Shakarji, F. Bunyak, G. Seetharaman, and K. Pala- niappan. Robust multi-object tracking for wide area motion imagery. IEEE Conf. on Applied Imagery Pattern Recogni- tion Workshop (AIPR), pages 1–5, 2018. 7

work page 2018

[7] [7]

AliAkbarpour, K

H. AliAkbarpour, K. Palaniappan, and G. Seetharaman. Parallax-tolerant aerial image georegistration and efﬁ- cient camera pose reﬁnementwithout piecewise homogra- phies. IEEE Trans. on Geoscience and Remote Sensing , 55(8):4618–4637, 2017. 2

work page 2017

[8] [8]

Basharat et al

A. Basharat et al. Real-time multi-target tracking at 210 megapixels/second in wide area motion imagery. IEEE Workshop on Applications of Computer Vision (WACV) , pages 839–846, 2014. 1

work page 2014

[9] [9]

Bunyak, K

F. Bunyak, K. Palaniappan, S.K. Nath, and G. Seetharaman. Flux tensor constrained geodesic active contours with sensor fusion for persistent object tracking. Journal of Multimedia, 2(4):20, 2007. 4

work page 2007

[10] [10]

Bunyak, K

F. Bunyak, K. Palaniappan, S. K. Nath, and G. Seethara- man. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In IEEE Work- shop on Applications of Computer Vision (WACV), pages 35– 35, 2007. 4

work page 2007

[11] [11]

Chavez-Garcia and O

R.O. Chavez-Garcia and O. Aycard. Multiple sensor fu- sion and classiﬁcation for moving object detection and track- ing. IEEE Trans. on Intelligent Transportation Systems , 17(2):525–534, 2016. 1

work page 2016

[12] [12]

Ekin, A.M

A. Ekin, A.M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7):796–807, 2003. 7

work page 2003

[13] [13]

Farmer, X

M.E. Farmer, X. Lu, H. Chen, and A.K. Jain. Robust motion- based image segmentation using fusion. IEEE Int. Conf. on Image Processing, 5:3375–3378, 2004. 1

work page 2004

[14] [14]

Gautama and M.A

T. Gautama and M.A. Van Hulle. A phase-based approach to the estimation of the optical ﬂow ﬁeld using spatial ﬁltering. IEEE Trans. on Neural Networks, 13(5):1127–1136, 2002. 1

work page 2002

[15] [15]

Girshick

R. Girshick. Fast R-CNN. In IEEE Int. Conf. on Computer Vision (ICCV), pages 1440–1448, 2015. 4

work page 2015

[16] [16]

Region-based convolutional networks for accurate object detection and segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2016. 4

work page 2016

[17] [17]

Hartley and A

R.I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. 2003. 3

work page 2003

[18] [18]

B. Heo, K. Yun, and J.Y . Choi. Appearance and motion based deep learning architecture for moving object detection in moving camera. In IEEE Int. Conf. on Image Processing (ICIP), pages 1827–1831, 2017. 1

work page 2017

[19] [19]

M. R. James, S. Robson, et al. Optimising UA V topographic surveys processed with structure-from-motion: Ground con- trol quality, quantity and bundle adjustment. Geomorphol- ogy, 280:51–66, 2017. 2

work page 2017

[20] [20]

D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y . Bulatov, and B. McCord. xView: Objects in con- text in overhead imagery. arXiv:1802.07856, 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Y .J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 1346–1353, 2012. 7

work page 2012

[22] [22]

Linger and A.A

M.E. Linger and A.A. Goshtasby. Aerial image registration for tracking. IEEE Transactions on Geoscience and Remote Sensing, 53(4):2137–2145, 2015. 2

work page 2015

[23] [23]

Liu et al

W. Liu et al. SSD: Single shot multibox detector. In Eu- ropean Conference on Computer Vision (ECCV) , volume LNCS 9905, pages 21–37, 2016. 4

work page 2016

[24] [24]

Lyu et al

S. Lyu et al. UA-DETRAC 2017: Report of A VSS2017 & IWT4S challenge on advanced trafﬁc monitoring. In IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pages 1–7, 2017. 1

work page 2017

[25] [25]

Nagel and A

H.H. Nagel and A. Gehrke. Spatiotemporally adaptive esti- mation and segmentation of OF-Fields. In European Con- ference on Computer Vision (ECCV) , volume LNCS 1407, pages 86–102, 1998. 3

work page 1998

[26] [26]

Naphade et al

M. Naphade et al. The 2018 NVIDIA AI city challenge. In IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pages 53–60, 2017. 1

work page 2018

[27] [27]

Nath and K

S. Nath and K. Palaniappan. Adaptive robust structure ten- sors for orientation estimation and image segmentation. In LNCS-3804: Proc. ISVC’05, pages 445–453, 2005. 3, 4

work page 2005

[28] [28]

Palaniappan, I

K. Palaniappan, I. Ersoy, and S.K. Nath. Moving object segmentation using the ﬂux tensor for biological video mi- croscopy. In Paciﬁc-Rim Conference on Multimedia, pages 483–493, 2007. 2, 3

work page 2007

[29] [29]

Palaniappan, R

K. Palaniappan, R. Rao, and G. Seetharaman. Wide-area persistent airborne video: Architecture and challenges. In B. Banhu et al., editors, Distributed Video Sensor Networks: Research Challenges and Future Directions , chapter 24, pages 349–371. Springer, 2011. 2

work page 2011

[30] [30]

Razakarivony and F

S. Razakarivony and F. Jurie. Vehicle detection in aerial imagery: A small target detection benchmark. Journal of Visual Communication and Image Representation , 34:187– 203, 2016. 4

work page 2016

[31] [31]

Redmon, S

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In IEEE Conf. Computer vision and Pattern Recognition, pages 779– 788, 2016. 4

work page 2016

[32] [32]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. YOLOv3: An incremental im- provement. arXiv preprint arXiv:1804.02767, 2018. 1, 4

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

Schneider, C

J. Schneider, C. Eling, L. Klingbeil, H. Kuhlmann, W. Frst- ner, and C. Stachniss. Fast and effective online pose estima- tion and mapping for UA Vs. InIEEE Int. Conf. on Robotics and Automation (ICRA), pages 4784–4791, 2016. 2

work page 2016

[34] [34]

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

M.J. Shaﬁee, B. Chywl, F. Li, and A. Wong. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv:1709.05943, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

M. Siam, H. Mahgoub, M. Zahran, S. Yogamani, M. Jager- sand, and A. El-Sallab. MODNET: Moving object detection network with motion and appearance for autonomous driv- ing. Int. Conf. Intelligent Transportation Systems, 2017. 1

work page 2017

[36] [36]

Van De Weijer, T

J. Van De Weijer, T. Gevers, and A.W.M. Smeulders. Robust photometric invariant features from the color tensor. IEEE Trans. on Image Processing, 15(1):118–127, 2006. 4

work page 2006

[37] [37]

R. Wang, F. Bunyak, G. Seetharaman, and K. Palaniappan. Static and moving object detection using ﬂux tensor with split gaussian models. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pages 414–418, 2014. 1

work page 2014

[38] [38]

C. Yuan, G. Medioni, J. Kang, and I. Cohen. Detecting motion regions in the presence of a strong parallax from a moving camera by multiview geometric constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1627–1641, 2007. 2

work page 2007

[39] [39]

Zhu et al

P. Zhu et al. VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results. In European Conference on Computer Vision (ECCV) , volume LNCS 11133, pages 496–518, 2019. 1

work page 2019