HDST-GNN: Heterogeneous Dynamic Spatiotemporal Graph Neural Networks for Multi-Object Tracking in UAV Aerial Imagery

Phillip Jiang

arxiv: 2606.05587 · v1 · pith:RPXQDNH2new · submitted 2026-06-04 · 💻 cs.CV · cs.AI· cs.LG

HDST-GNN: Heterogeneous Dynamic Spatiotemporal Graph Neural Networks for Multi-Object Tracking in UAV Aerial Imagery

Phillip Jiang This is my paper

Pith reviewed 2026-06-28 02:44 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords multi-object trackingUAV imagerygraph neural networksheterogeneous graphsocclusion handlingaltitude adaptationdata associationVisDrone

0 comments

The pith

HDST-GNN reduces identity switches in UAV multi-object tracking by adapting graph edges to altitude, using distinct node types, and gating aggregation by occlusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HDST-GNN to address multi-object tracking challenges in UAV imagery, including varying altitudes, small dense objects, and frequent occlusions that cause identity switches. It introduces three components: altitude-adaptive edge construction that estimates camera height from mean object area to set connectivity radius, heterogeneous node representations that treat detections, confirmed tracklets, and lost tracklets as distinct types with typed relations, and occlusion-gated temporal aggregation that limits attention from occluded nodes. The model is trained end-to-end using a differentiable Sinkhorn head with cross-entropy and triplet losses. On VisDrone2019-MOT with oracle detections it reaches 94.51 percent MOTA and 97.24 percent IDF1, outperforming SORT by 5 MOTA points and cutting identity switches by 81 percent; with real YOLOv8n detections it cuts switches by 49 percent. Ablation studies are cited to show each component contributes independently.

Core claim

HDST-GNN is a heterogeneous dynamic spatiotemporal graph neural network whose altitude-adaptive edge construction estimates a camera-altitude proxy from mean object area to adjust connectivity radius, whose heterogeneous node representation models detections as Type-D, confirmed tracklets as Type-T, and lost tracklets as Type-L with dedicated projections and typed edge relations, and whose occlusion-gated temporal aggregation gates each node's attention contribution by occlusion confidence, yielding 94.51 percent MOTA and 97.24 percent IDF1 on VisDrone2019-MOT with oracle detections and reducing identity switches by 49 percent versus SORT with real detections.

What carries the argument

The three components of HDST-GNN: Altitude-Adaptive Edge Construction using mean object area as altitude proxy, Heterogeneous Node Representation with Type-D, Type-T and Type-L nodes and typed relations, and Occlusion-Gated Temporal Aggregation that modulates attention by occlusion confidence.

If this is right

Altitude-adaptive edges allow the graph to maintain appropriate spatial context as UAV height changes across sequences.
Heterogeneous node types and typed relations prevent uniform treatment of detections versus active and lost tracklets.
Occlusion gating prevents corrupted embeddings from propagating through the temporal aggregation step.
End-to-end training with the Sinkhorn head produces a fully differentiable association pipeline.
Performance gains hold for both perfect oracle detections and noisy real detections from YOLOv8n.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The altitude proxy derived from object area could be replaced by direct metadata when available, potentially simplifying the model for calibrated cameras.
The same node-type distinction and gating logic might transfer to ground-based tracking scenarios that also exhibit scale change and partial occlusion.
Pairing HDST-GNN with a detector that outputs per-detection occlusion scores would remove the need to derive occlusion from other signals.

Load-bearing premise

The assumption that the three components each independently drive the reported gains, as asserted via ablation studies whose experimental controls are not described.

What would settle it

An ablation experiment on VisDrone2019-MOT in which disabling any one of the three components produces no measurable change in MOTA or identity-switch count would falsify the claim of independent contributions.

Figures

Figures reproduced from arXiv: 2606.05587 by Phillip Jiang.

**Figure 1.** Figure 1: HDST-GNN pipeline. The AppearanceExtractor extracts embeddings from frame crops. The GraphBuilder constructs a heterogeneous graph with altitude-adaptive edge radius (C1) and three node types (C2). The HDST-GNN applies occlusion-gated attention (C3) over five typed edge relations to refine embeddings. The Association Head uses Sinkhorn matching during training and Hungarian matching during inference. 6 [P… view at source ↗

**Figure 2.** Figure 2: Altitude-adaptive radius (C1). High-altitude frame (left): mean object area a¯ ≈ 120 px2 , ˆz ≈ 1.2, reff ≈ 110 px. Low-altitude frame (right): ¯a ≈ 900 px2 , ˆz ≈ −0.8, reff ≈ 240 px. Circles show the connectivity radius around each detection node [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison. Top: re-identification after occlusion. Bottom: tracking across an altitude change. Coloured bounding boxes denote track IDs (consistent colour = consistent identity). ID switches are highlighted with dashed red borders. Results shown on validation sequences uav0000305 and uav0000339, where HDST-GNN achieves the largest MOTA gains over SORT (+9.68 and +8.29 pp). The ID-switch count… view at source ↗

**Figure 4.** Figure 4: Distribution of reff values (Equation 3) across VisDrone2019-MOT validation frames as a function of ˆz. The adaptive curve (blue) tracks the oracle optimal radius (grey) more closely than the fixed baseline (red dashed). 5 Discussion Strengths. HDST-GNN’s altitude-adaptive radius directly addresses a systematic failure mode of fixed-radius graph trackers on UAV data. The heterogeneous node representation n… view at source ↗

read the original abstract

Multi-object tracking (MOT) from UAV imagery presents unique challenges: altitude varies across sequences, objects are small and densely packed, and frequent occlusion causes identity switches. Existing graph-based trackers assume fixed spatial context and treat all objects uniformly, ignoring the heterogeneous lifecycle states of detections, active tracklets, and lost targets. We propose HDST-GNN, a Heterogeneous Dynamic Spatiotemporal Graph Neural Network with three novel contributions. First, Altitude-Adaptive Edge Construction estimates a camera-altitude proxy from mean object area and adjusts the graph connectivity radius accordingly. Second, Heterogeneous Node Representation models detections (Type-D), confirmed tracklets (Type-T), and lost tracklets (Type-L) as distinct node types with dedicated projections and typed edge relations. Third, Occlusion-Gated Temporal Aggregation gates each node's attention contribution by its occlusion confidence, preventing occluded nodes from corrupting neighbour embeddings. HDST-GNN is trained end-to-end with a differentiable Sinkhorn head using joint cross-entropy and triplet loss. On VisDrone2019-MOT with oracle detections, HDST-GNN achieves 94.51% MOTA and 97.24% IDF1, outperforming SORT by +5.0 MOTA points and reducing identity switches by 81%. With real YOLOv8n detections, HDST-GNN reduces identity switches by 49% vs. SORT. Ablation studies confirm the independent contribution of each component.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HDST-GNN adds altitude-adaptive edges, typed nodes, and occlusion gating to graph tracking and shows clear ID-switch reductions on VisDrone, but the ablation evidence is too thin to confirm the mechanisms.

read the letter

The paper's core move is to build a heterogeneous dynamic spatiotemporal GNN that treats detections, active tracklets, and lost tracklets as distinct node types, scales edge radius by an altitude proxy from object size, and gates temporal attention by occlusion . It reports 94.51 MOTA and 97.24 IDF1 on VisDrone2019-MOT with oracle boxes, beating SORT by 5 points and cutting ID switches 81 percent; the gap stays meaningful with YOLOv8n detections.

Those three changes are the actual novelty. The typed nodes and occlusion gate are straightforward responses to the UAV setting where objects change scale and disappear often. Reporting both oracle and detector-based numbers is useful and shows the method is not just tuned to perfect inputs.

The weak point is the ablation claim. The abstract says the studies confirm independent contributions, yet gives no protocol, no per-component metric deltas, and no check that parameter count or training schedule stayed constant. Without those controls the +5 MOTA and ID-switch drops could come from extra capacity rather than the stated mechanisms. That is the load-bearing assumption and it is not yet supported in the provided text.

The work is aimed at people already doing graph-based MOT on aerial data. A reader who needs concrete numbers on VisDrone and is willing to implement the three tweaks themselves can extract value. It is not yet ready for broad citation because the attribution of gains is not demonstrated.

A serious editor should send it to review so the full methods, exact ablation tables, and any statistical checks can be examined. The ideas are practical enough that the gaps are worth fixing rather than rejecting outright.

Referee Report

1 major / 0 minor

Summary. The paper proposes HDST-GNN, a heterogeneous dynamic spatiotemporal graph neural network for multi-object tracking in UAV aerial imagery. It introduces three components: altitude-adaptive edge construction that estimates a camera-altitude proxy from mean object area to adjust graph connectivity radius; heterogeneous node representations distinguishing Type-D (detections), Type-T (confirmed tracklets), and Type-L (lost tracklets) with dedicated projections and typed relations; and occlusion-gated temporal aggregation that modulates attention by occlusion confidence. The model is trained end-to-end with a differentiable Sinkhorn head using joint cross-entropy and triplet loss. On VisDrone2019-MOT with oracle detections it reports 94.51% MOTA and 97.24% IDF1, outperforming SORT by +5.0 MOTA points and reducing identity switches by 81%; with YOLOv8n detections it reduces identity switches by 49%. Ablation studies are stated to confirm the independent contribution of each component.

Significance. If the reported gains hold under controlled evaluation, the targeted handling of altitude variation and occlusion via graph structure could advance UAV-specific MOT, particularly for dense small-object scenarios. The end-to-end differentiable Sinkhorn head is a methodological strength that enables joint optimization of embeddings and assignment.

major comments (1)

[Abstract] Abstract: the statement that 'ablation studies confirm the independent contribution of each component' provides no protocol details (e.g., exact variants tested, metric deltas per component, or controls for parameter count and training schedule). This is load-bearing for the central claim that the +5.0 MOTA gain and 81% ID-switch reduction are attributable to altitude-adaptive edges, heterogeneous nodes, and occlusion gating rather than capacity or tuning differences.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that 'ablation studies confirm the independent contribution of each component' provides no protocol details (e.g., exact variants tested, metric deltas per component, or controls for parameter count and training schedule). This is load-bearing for the central claim that the +5.0 MOTA gain and 81% ID-switch reduction are attributable to altitude-adaptive edges, heterogeneous nodes, and occlusion gating rather than capacity or tuning differences.

Authors: We agree that the abstract statement lacks the protocol details required to support the claim. The full ablation studies—including exact variants tested, per-component metric deltas, and controls for parameter count and training schedule—are reported in Section 4.3 of the manuscript. Given the length constraints of an abstract, we will revise the abstract to remove the sentence asserting that ablation studies confirm the independent contribution of each component. This change ensures the abstract contains only claims that can be fully substantiated within its text, while the attribution of gains remains supported by the detailed experiments in the body of the paper. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an empirical GNN architecture for multi-object tracking, with three proposed components trained end-to-end using standard cross-entropy and triplet losses plus a differentiable Sinkhorn head. No mathematical derivation, equations, or first-principles chain is presented that could reduce to its own inputs by construction. Performance claims rest on reported metrics from VisDrone2019-MOT experiments rather than any self-referential fitting or self-citation load-bearing step. Absence of ablation protocol details is a methodological gap but does not create circularity under the defined patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; the model introduces new modeling choices for graph construction and node representations that function as additional design decisions.

free parameters (1)

altitude proxy scaling factor
Derived from mean object area to adjust connectivity radius; exact functional form and any fitted constants not specified in abstract.

axioms (1)

standard math The Sinkhorn algorithm can be used differentiably for assignment in tracking
Invoked for the end-to-end training with the matching head.

invented entities (1)

Type-D, Type-T, Type-L node types no independent evidence
purpose: To model different lifecycle states of objects in tracking
New node type distinctions introduced in the model architecture.

pith-pipeline@v0.9.1-grok · 5791 in / 1440 out tokens · 77966 ms · 2026-06-28T02:44:03.844461+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 5 canonical work pages · 3 internal anchors

[1]

VisDrone-MOT2019: The Vision Meets Drone Multiple Object Tracking Challenge Results.ICCV Workshops2019

Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Nie, J.; Cheng, H.; Liu, C.; Liu, X.; et al. VisDrone-MOT2019: The Vision Meets Drone Multiple Object Tracking Challenge Results.ICCV Workshops2019
[2]

VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results.ICCV Workshops2021

Fan, H.; Ling, H. VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results.ICCV Workshops2021
[3]

Simple Online and Realtime Tracking.ICIP2016, 3464–3468

Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking.ICIP2016, 3464–3468
[4]

Simple Online and Realtime Tracking with a Deep Association Metric.ICIP2017, 3645–3649

Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric.ICIP2017, 3645–3649
[5]

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.ECCV 2022, 1–21

Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box.ECCV 2022, 1–21

2022
[6]

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.CVPR2023

Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.CVPR2023
[7]

StrongSORT: Make DeepSORT Great Again.IEEE Trans

Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again.IEEE Trans. Multimedia2023, 25, 8725–8737
[8]

Learning a Neural Solver for Multiple Object Tracking

Bras´ o, G.; Leal-Taix´ e, L. Learning a Neural Solver for Multiple Object Tracking. CVPR2020, 6247–6257
[9]

GCNNMatch: Graph Convolu- tional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization

Papakis, I.; Sarkar, A.; Bhattacharyya, A. GCNNMatch: Graph Convolu- tional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization. arXiv:2010.000672020

work page arXiv 2010
[10]

Towards Realtime Multi-Object Tracking.ECCV2020, 107–122

Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards Realtime Multi-Object Tracking.ECCV2020, 107–122
[11]

TrackFormer: Multi- Object Tracking with Transformers.CVPR2022, 8844–8854

Meinhardt, T.; Kirillov, A.; Leal-Taix´ e, L.; Feichtenhofer, C. TrackFormer: Multi- Object Tracking with Transformers.CVPR2022, 8844–8854. 17
[12]

MOTR: End-to-End Multiple-Object Tracking with Transformer.ECCV2022, 145–161

Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. MOTR: End-to-End Multiple-Object Tracking with Transformer.ECCV2022, 145–161
[13]

Ultralytics YOLO (Version 8.0.0).GitHub2023

Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0).GitHub2023. Available online: https://github.com/ultralytics/ultralytics
[14]

Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance.Remote Sens.2025,17, 3502

Qian, H.; Sun, X.; Guo, R.; Su, S.; Ding, B.; Guo, X. Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance.Remote Sens.2025,17, 3502. https://doi.org/10.3390/rs17203502

work page doi:10.3390/rs17203502 2025
[15]

SuperGlue: Learning Feature Matching with Graph Neural Networks.CVPR2020, 4938–4947

Sarlin, P.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks.CVPR2020, 4938–4947
[16]

In Defense of the Triplet Loss for Person Re-Identification

Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet Loss for Person Re- Identification.arXiv:1703.077372017

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Deep Residual Learning for Image Recognition

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. CVPR2016, 770–778
[18]

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking.ECCV Workshops2016, 17–35

Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking.ECCV Workshops2016, 17–35
[19]

HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.IJCV2021, 129, 548–578

Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taix´ e, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.IJCV2021, 129, 548–578
[20]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.NeurIPS2015, 91–99

Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.NeurIPS2015, 91–99
[21]

YOLOv3: An Incremental Improvement

Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement.arXiv:1804.02767 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Feature Pyramid Networks for Object Detection.CVPR2017, 2117–2125

Lin, T.-Y.; Doll´ ar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection.CVPR2017, 2117–2125
[23]

Clustered Object Detection in Aerial Images.ICCV2019, 8311–8320

Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered Object Detection in Aerial Images.ICCV2019, 8311–8320
[24]

Finding Tiny Faces in the Wild with Generative Adversarial Network.CVPR2018, 21–30

Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. Finding Tiny Faces in the Wild with Generative Adversarial Network.CVPR2018, 21–30
[25]

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.ECCV 2018, 375–391

Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.ECCV 2018, 375–391. 18

2018
[26]

Zheng, L.; Yang, Y.; Hauptmann, A. G. Person Re-Identification: Past, Present and Future.arXiv:1610.029842016. 19

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

VisDrone-MOT2019: The Vision Meets Drone Multiple Object Tracking Challenge Results.ICCV Workshops2019

Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Nie, J.; Cheng, H.; Liu, C.; Liu, X.; et al. VisDrone-MOT2019: The Vision Meets Drone Multiple Object Tracking Challenge Results.ICCV Workshops2019

[2] [2]

VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results.ICCV Workshops2021

Fan, H.; Ling, H. VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results.ICCV Workshops2021

[3] [3]

Simple Online and Realtime Tracking.ICIP2016, 3464–3468

Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking.ICIP2016, 3464–3468

[4] [4]

Simple Online and Realtime Tracking with a Deep Association Metric.ICIP2017, 3645–3649

Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric.ICIP2017, 3645–3649

[5] [5]

ByteTrack: Multi-Object Tracking by Associating Every Detection Box.ECCV 2022, 1–21

Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box.ECCV 2022, 1–21

2022

[6] [6]

Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.CVPR2023

Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.CVPR2023

[7] [7]

StrongSORT: Make DeepSORT Great Again.IEEE Trans

Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again.IEEE Trans. Multimedia2023, 25, 8725–8737

[8] [8]

Learning a Neural Solver for Multiple Object Tracking

Bras´ o, G.; Leal-Taix´ e, L. Learning a Neural Solver for Multiple Object Tracking. CVPR2020, 6247–6257

[9] [9]

GCNNMatch: Graph Convolu- tional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization

Papakis, I.; Sarkar, A.; Bhattacharyya, A. GCNNMatch: Graph Convolu- tional Neural Networks for Multi-Object Tracking via Sinkhorn Normalization. arXiv:2010.000672020

work page arXiv 2010

[10] [10]

Towards Realtime Multi-Object Tracking.ECCV2020, 107–122

Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards Realtime Multi-Object Tracking.ECCV2020, 107–122

[11] [11]

TrackFormer: Multi- Object Tracking with Transformers.CVPR2022, 8844–8854

Meinhardt, T.; Kirillov, A.; Leal-Taix´ e, L.; Feichtenhofer, C. TrackFormer: Multi- Object Tracking with Transformers.CVPR2022, 8844–8854. 17

[12] [12]

MOTR: End-to-End Multiple-Object Tracking with Transformer.ECCV2022, 145–161

Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. MOTR: End-to-End Multiple-Object Tracking with Transformer.ECCV2022, 145–161

[13] [13]

Ultralytics YOLO (Version 8.0.0).GitHub2023

Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0).GitHub2023. Available online: https://github.com/ultralytics/ultralytics

[14] [14]

Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance.Remote Sens.2025,17, 3502

Qian, H.; Sun, X.; Guo, R.; Su, S.; Ding, B.; Guo, X. Low-Altitude Multi-Object Tracking via Graph Neural Networks with Cross-Attention and Reliable Neighbor Guidance.Remote Sens.2025,17, 3502. https://doi.org/10.3390/rs17203502

work page doi:10.3390/rs17203502 2025

[15] [15]

SuperGlue: Learning Feature Matching with Graph Neural Networks.CVPR2020, 4938–4947

Sarlin, P.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks.CVPR2020, 4938–4947

[16] [16]

In Defense of the Triplet Loss for Person Re-Identification

Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet Loss for Person Re- Identification.arXiv:1703.077372017

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Deep Residual Learning for Image Recognition

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. CVPR2016, 770–778

[18] [18]

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking.ECCV Workshops2016, 17–35

Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking.ECCV Workshops2016, 17–35

[19] [19]

HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.IJCV2021, 129, 548–578

Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taix´ e, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.IJCV2021, 129, 548–578

[20] [20]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.NeurIPS2015, 91–99

Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.NeurIPS2015, 91–99

[21] [21]

YOLOv3: An Incremental Improvement

Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement.arXiv:1804.02767 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Feature Pyramid Networks for Object Detection.CVPR2017, 2117–2125

Lin, T.-Y.; Doll´ ar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection.CVPR2017, 2117–2125

[23] [23]

Clustered Object Detection in Aerial Images.ICCV2019, 8311–8320

Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered Object Detection in Aerial Images.ICCV2019, 8311–8320

[24] [24]

Finding Tiny Faces in the Wild with Generative Adversarial Network.CVPR2018, 21–30

Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. Finding Tiny Faces in the Wild with Generative Adversarial Network.CVPR2018, 21–30

[25] [25]

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.ECCV 2018, 375–391

Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking.ECCV 2018, 375–391. 18

2018

[26] [26]

Zheng, L.; Yang, Y.; Hauptmann, A. G. Person Re-Identification: Past, Present and Future.arXiv:1610.029842016. 19

work page internal anchor Pith review Pith/arXiv arXiv