NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking

Benjamin Missaoui; Guillem Bras\'o; Laura Leal-Taix\'e; Orcun Cetintas; Tim Meinhardt

arxiv: 2509.02111 · v2 · submitted 2025-09-02 · 💻 cs.CV

NOOUGAT: Towards Unified Online and Offline Multi-Object Tracking

Benjamin Missaoui , Orcun Cetintas , Guillem Bras\'o , Tim Meinhardt , Laura Leal-Taix\'e This is my paper

Pith reviewed 2026-05-18 19:54 UTC · model grok-4.3

classification 💻 cs.CV

keywords multi-object trackingonline trackingoffline trackinggraph neural networkautoregressive trackingtemporal fusionobject association

0 comments

The pith

NOOUGAT unifies online and offline multi-object tracking by processing non-overlapping subclips with a graph neural network and autoregressive fusion layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NOOUGAT as a single system that removes the traditional split between online trackers, which work frame by frame but falter on long occlusions, and offline trackers, which use larger time windows but rely on ad-hoc stitching. It breaks input videos into non-overlapping subclips that a graph neural network processes locally, then uses a new autoregressive long-term tracking layer to connect identities across those subclips. The size of each subclip becomes a dial that trades off processing delay against how much future context the tracker sees, supporting everything from real-time use to full-batch analysis. If the approach holds, practitioners no longer need separate models or rules for different latency requirements, and the reported accuracy lifts on DanceTrack, SportsMOT, and MOT20 indicate that the unified route also improves association quality in both regimes.

Core claim

NOOUGAT is the first tracker designed to operate with arbitrary temporal horizons. It leverages a unified Graph Neural Network framework that processes non-overlapping subclips and fuses them through a novel Autoregressive Long-term Tracking layer. The subclip size controls the trade-off between latency and temporal context, enabling a wide range of deployment scenarios from frame-by-frame to batch processing. It achieves state-of-the-art performance across both tracking regimes, improving online AssA by +2.3 on DanceTrack, +9.2 on SportsMOT, and +5.0 on MOT20, with even greater gains in offline mode.

What carries the argument

The Autoregressive Long-term Tracking (ALT) layer, which fuses object associations across non-overlapping subclips inside a graph neural network to maintain identity consistency over variable time spans.

If this is right

Subclip length becomes a single tunable parameter that trades latency for temporal context without retraining or redesigning the tracker.
The same trained model supports both low-latency frame-by-frame operation and higher-accuracy batch processing of entire videos.
Long-term occlusions are handled through autoregressive fusion rather than hand-crafted association rules or post-hoc stitching.
Performance gains appear in both online and offline regimes on standard benchmarks such as DanceTrack, SportsMOT, and MOT20.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The subclip-and-fuse design could extend naturally to streaming settings where data arrives in irregular bursts rather than fixed frames.
Because the core representation is a graph neural network, additional cues such as appearance embeddings or 3D motion could be added as node or edge features without changing the overall architecture.
Testing on videos lasting hours rather than minutes would reveal whether drift remains bounded or requires periodic global re-optimization.

Load-bearing premise

The Autoregressive Long-term Tracking layer can reliably fuse information across non-overlapping subclips without accumulating identity switches or drift over arbitrarily long sequences.

What would settle it

Measuring identity-switch rate and association accuracy on sequences many times longer than the chosen subclip size while keeping the same model and subclip size fixed.

read the original abstract

The long-standing division between \textit{online} and \textit{offline} Multi-Object Tracking (MOT) has led to fragmented solutions that fail to address the flexible temporal requirements of real-world deployment scenarios. Current \textit{online} trackers rely on frame-by-frame hand-crafted association strategies and struggle with long-term occlusions, whereas \textit{offline} approaches can cover larger time gaps, but still rely on heuristic stitching for arbitrarily long sequences. In this paper, we introduce NOOUGAT, the first tracker designed to operate with arbitrary temporal horizons. NOOUGAT leverages a unified Graph Neural Network (GNN) framework that processes non-overlapping subclips, and fuses them through a novel Autoregressive Long-term Tracking (ALT) layer. The subclip size controls the trade-off between latency and temporal context, enabling a wide range of deployment scenarios, from frame-by-frame to batch processing. NOOUGAT achieves state-of-the-art performance across both tracking regimes, improving \textit{online} AssA by +2.3 on DanceTrack, +9.2 on SportsMOT, and +5.0 on MOT20, with even greater gains in \textit{offline} mode.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NOOUGAT, the first multi-object tracker designed to operate with arbitrary temporal horizons. It employs a unified Graph Neural Network (GNN) that processes fixed-size non-overlapping subclips and fuses their outputs via a novel Autoregressive Long-term Tracking (ALT) layer; subclip size is the sole control for the latency-context trade-off. The paper reports state-of-the-art results, with online AssA gains of +2.3 on DanceTrack, +9.2 on SportsMOT, and +5.0 on MOT20, and larger gains in offline mode.

Significance. If the unification and arbitrary-horizon claims hold, the work would meaningfully advance MOT by removing the online/offline dichotomy and enabling flexible deployment. The reported numeric gains on standard benchmarks indicate practical value for long-term association under occlusion.

major comments (2)

[ALT layer description (methods)] ALT layer description (methods): the autoregressive fusion of non-overlapping subclips is asserted to support arbitrary horizons, yet the text supplies no explicit anti-drift mechanism (periodic global re-matching, uncertainty-aware propagation, or re-optimization) that would prevent compounding of independent association errors across subclips. This is load-bearing for the central claim.
[Results section] Results section: while specific AssA deltas are stated, no ablation or error analysis is provided on how performance or identity-switch rate scales with the number of subclips (i.e., sequence length). Without such evidence the arbitrary-horizon guarantee remains unverified.

minor comments (2)

[Abstract] Abstract: the phrase 'even greater gains in offline mode' is left unquantified; supplying the corresponding numeric improvements would aid immediate assessment.
[Notation] Notation: confirm that 'AssA' and related metrics are defined consistently on first use and that all dataset names receive standard citations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and thoughtful review of our manuscript. The comments raise important points about the robustness of the ALT layer and the verification of arbitrary-horizon performance. We address each comment below and outline the revisions we plan to make.

read point-by-point responses

Referee: [ALT layer description (methods)] ALT layer description (methods): the autoregressive fusion of non-overlapping subclips is asserted to support arbitrary horizons, yet the text supplies no explicit anti-drift mechanism (periodic global re-matching, uncertainty-aware propagation, or re-optimization) that would prevent compounding of independent association errors across subclips. This is load-bearing for the central claim.

Authors: We appreciate the referee highlighting this aspect of the ALT layer. The manuscript describes the ALT layer as an autoregressive mechanism that fuses outputs from consecutive subclips by using the association graph from the previous subclip to initialize the next. This design allows the model to carry forward identity information across subclips without resetting. While we do not introduce an additional explicit anti-drift module such as periodic re-matching, the end-to-end training of the GNN on sequences with varying lengths enables the network to learn robust propagation that minimizes error accumulation. The SOTA results on long sequences in MOT20 and other datasets provide empirical support for this. To strengthen the paper, we will revise the methods section to explicitly discuss potential error propagation and how the architecture addresses it through learned representations. revision: yes
Referee: [Results section] Results section: while specific AssA deltas are stated, no ablation or error analysis is provided on how performance or identity-switch rate scales with the number of subclips (i.e., sequence length). Without such evidence the arbitrary-horizon guarantee remains unverified.

Authors: We agree that demonstrating how performance scales with the number of subclips would further validate the arbitrary-horizon claim. In the current manuscript, we focus on reporting results for the full sequences in the benchmarks, which inherently involve multiple subclips for longer videos. However, we did not provide a dedicated ablation varying the subclip count or analyzing identity switches as a function of sequence length. We will add such an analysis in the revised version, either through additional experiments or by breaking down the results on existing data by sequence length where possible. This will help verify the scaling behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: method is architectural design with empirical validation

full rationale

The paper introduces NOOUGAT as a unified GNN-based tracker with a novel Autoregressive Long-term Tracking (ALT) layer that processes non-overlapping subclips for arbitrary horizons. The subclip size is presented as an explicit design knob trading latency for context, and performance gains are reported as experimental results on DanceTrack, SportsMOT, and MOT20. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation chain. The central construction (GNN + ALT fusion) is a proposed architecture whose correctness is left to empirical evaluation rather than reducing to its own inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; the method description implies standard GNN message-passing and autoregressive recurrence but does not enumerate them.

pith-pipeline@v0.9.0 · 5765 in / 1154 out tokens · 45127 ms · 2026-05-18T19:54:06.574016+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

NOOUGAT leverages a unified Graph Neural Network (GNN) framework that processes non-overlapping subclips, and fuses them through a novel Autoregressive Long-term Tracking (ALT) layer. The subclip size controls the trade-off between latency and temporal context
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the ALT layer, a fully learnable and data-driven GNN association module that dynamically uses the most relevant cues across various temporal contexts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 4 internal anchors

[1]

Ding S, Schneider L, Cordts M, Gall J.: ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

work page
[2]

Explor- ing Simple 3D Multi-Object Tracking for Autonomous Driving

Luo C, Yang X, Yuille A. Explor- ing Simple 3D Multi-Object Tracking for Autonomous Driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021. p. 10488– 10497

work page 2021
[3]

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learn- ing

Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, et al. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learn- ing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page
[4]

SPAMming Labels: Efficient Anno- tations for the Trackers of Tomorrow

Cetintas O, Meinhardt T, Bras´ o G, Leal- Taix´ e L. SPAMming Labels: Efficient Anno- tations for the Trackers of Tomorrow. In: European Conference on Computer Vision (ECCV); 2024

work page 2024
[5]

Effi- ciently Scaling Up Video Annotation with Crowdsourced Marketplaces

Vondrick C, Ramanan D, Patterson D. Effi- ciently Scaling Up Video Annotation with Crowdsourced Marketplaces. In: Daniilidis K, Maragos P, Paragios N, editors. Computer Vision – ECCV 2010. Berlin, Heidelberg: Springer Berlin Heidelberg; 2010. p. 610–623

work page 2010
[6]

DiffMOT: A Real-time Diffusion- based Multiple Object Tracker with Non- linear Prediction

Lv W, Huang Y, Zhang N, Lin RS, Han M, Zeng D. DiffMOT: A Real-time Diffusion- based Multiple Object Tracker with Non- linear Prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024. p. 19321– 19330

work page 2024
[7]

Available from: https:// arxiv.org/abs/2303.10404

Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W.: MotionTrack: Learning Robust Short-term and Long-term Motions for Multi- Object Tracking. Available from: https:// arxiv.org/abs/2303.10404

work page arXiv
[8]

Observation-centric sort: Rethinking sort for robust multi-object tracking

Cao J, Pang J, Weng X, Khirodkar R, Kitani K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page
[9]

Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? In: Oh AH, Agarwal A, Belgrave D, Cho K, editors

Dendorfer P, Yugay V, Osep A, Leal-Taix´ e L. Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? In: Oh AH, Agarwal A, Belgrave D, Cho K, editors. Advances in Neural Information Pro- cessing Systems; 2022. Available from: https: //openreview.net/forum?id=3r0yLLCo4fF

work page 2022
[10]

Learning an Image-Based Motion Context for Multiple People Track- ing

Leal-Taix´ e L, Fenzi M, Kuznetsova A, Rosen- hahn B, Savarese S. Learning an Image-Based Motion Context for Multiple People Track- ing. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 3542–3549

work page 2014
[11]

Available from: https://arxiv.org/abs/2003.08177

Wang G, Yang S, Liu H, Wang Z, Yang Y, Wang S, et al.: High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification. Available from: https://arxiv.org/abs/2003.08177

work page arXiv 2003
[12]

Unsupervised Pre-training for Person Re-identification

Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, et al. Unsupervised Pre-training for Person Re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition. 2021

work page 2021
[13]

TransReID: Transformer-Based Object Re-Identification

He S, Luo H, Wang P, Wang F, Li H, Jiang W. TransReID: Transformer-Based Object Re-Identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021. p. 15013– 15022

work page 2021
[14]

Key- point Promptable Re-Identification

Somers V, Alahi A, Vleeschouwer CD. Key- point Promptable Re-Identification. In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, Septem- ber 29-October 4, 2024, Proceedings, Part LXXIX. vol. 15137 of Lecture Notes in Computer Science. Springer; 2024. p. 216–

work page 2024
[15]

1007/978-3-031-72986-7 13

Available from: https://doi.org/10. 1007/978-3-031-72986-7 13

work page
[16]

Simple online and realtime tracking

Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP); 2016. p. 3464–3468

work page 2016
[17]

Simple Online and Realtime Tracking with a Deep Association Metric

Wojke N, Bewley A, Paulus D. Simple Online and Realtime Tracking with a Deep Association Metric. In: 2017 IEEE Inter- national Conference on Image Processing (ICIP). IEEE; 2017. p. 3645–3649

work page 2017
[18]

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, et al. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. Proceedings of the European Conference on Computer Vision (ECCV). 2022

work page 2022
[19]

Hybrid-sort: Weak cues matter for online multi-object tracking

Yang M, Han G, Yan B, Zhang W, Qi J, Lu H, et al. Hybrid-sort: Weak cues matter for online multi-object tracking. In: Proceed- ings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 6504–6512

work page 2024
[20]

MOTR: End-to-End Multiple- Object Tracking with Transformer

Zeng F, Dong B, Zhang Y, Wang T, Zhang X, Wei Y. MOTR: End-to-End Multiple- Object Tracking with Transformer. In: 14 Computer Vision – ECCV 2022: 17th Euro- pean Conference, Tel Aviv, Israel, Octo- ber 23–27, 2022, Proceedings, Part XXVII. Berlin, Heidelberg: Springer-Verlag; 2022. p. 659–675. Available from: https://doi.org/10. 1007/978-3-031-19812-0 38

work page 2022
[21]

MOTRv2: Boot- strapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

Zhang Y, Wang T, Zhang X. MOTRv2: Boot- strapping End-to-End Multi-Object Tracking by Pretrained Object Detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

work page 2023
[22]

& Chen, C

p. 22056–22065. Available from: http:// dx.doi.org/10.1109/CVPR52729.2023.02112

work page doi:10.1109/cvpr52729.2023.02112 2023
[23]

MeMOTR: Long- Term Memory-Augmented Transformer for Multi-Object Tracking

Gao R, Wang L. MeMOTR: Long- Term Memory-Augmented Transformer for Multi-Object Tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2023. p. 9901– 9910

work page 2023
[24]

Available from: https://arxiv.org/abs/2403.16848

Gao R, Qi J, Wang L.: Multiple Object Tracking as ID Prediction. Available from: https://arxiv.org/abs/2403.16848

work page arXiv
[25]

Learning a Neural Solver for Multiple Object Tracking

Bras´ o G, Leal-Taix´ e L. Learning a Neural Solver for Multiple Object Tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2020

work page 2020
[26]

Multi- Object Tracking and Segmentation Via Neu- ral Message Passing

Bras´ o G, Cetintas O, Leal-Taix´ e L. Multi- Object Tracking and Segmentation Via Neu- ral Message Passing. International Journal of Computer Vision. 2022;https://doi.org/10. 1007/s11263-022-01678-6

work page 2022
[27]

Uni- fying Short and Long-Term Tracking With Graph Hierarchies

Cetintas O, Bras´ o G, Leal-Taix´ e L. Uni- fying Short and Long-Term Tracking With Graph Hierarchies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023. p. 22877–22887

work page 2023
[28]

Multi- scene generalized trajectory global graph solver with composite nodes for multiple object tracking

Gao Y, Xu H, Li J, Wang N, Gao X. Multi- scene generalized trajectory global graph solver with composite nodes for multiple object tracking. In: Proceedings of the Thirty-Eighth AAAI Conference on Artifi- cial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances...

work page doi:10.1609/aaai.v38i3.27953 2024
[29]

The Architectural Implications of Autonomous Driving: Con- straints and Acceleration

Lin SC, Zhang Y, Hsu CH, Skach M, Haque ME, Tang L, et al. The Architectural Implications of Autonomous Driving: Con- straints and Acceleration. SIGPLAN Not. 2018 Mar;53(2):751–766. https://doi.org/10. 1145/3296957.3173191

work page arXiv 2018
[30]

VETRA: A Dataset for Vehicle Tracking in Aerial Imagery – New Challenges for Multi-Object Tracking

Hellekes J, M¨ uhlhaus M, Bahmanyar R, Azimi SM, Kurz F. VETRA: A Dataset for Vehicle Tracking in Aerial Imagery – New Challenges for Multi-Object Tracking. In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, Septem- ber 29–October 4, 2024, Proceedings, Part LXXXV. Berlin, Heidelberg: Springer-Verlag

work page 2024
[31]

p. 52–70. Available from: https://doi. org/10.1007/978-3-031-73013-9 4

work page doi:10.1007/978-3-031-73013-9
[32]

KIT - Insti- tute of Photogrammetry and Remote Sensing (IPF)

Schmidt F.: Data Set for Tracking Vehicles in Aerial Image Sequences. KIT - Insti- tute of Photogrammetry and Remote Sensing (IPF). https://www.ipf.kit.edu/downloads data set AIS vehicle tracking.php

work page
[33]

Available from: https://arxiv.org/ abs/2405.15755

Han X, Oishi N, Tian Y, Ucurum E, Young R, Chatwin C, et al.: ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking. Available from: https://arxiv.org/ abs/2405.15755

work page arXiv
[34]

MambaTrack: A Simple Baseline for Multiple Object Track- ing with State Space Model

Xiao C, Cao Q, Luo Z, Lan L. MambaTrack: A Simple Baseline for Multiple Object Track- ing with State Space Model. In: Proceedings of the 32nd ACM International Conference on Multimedia. MM ’24. New York, NY, USA: Association for Computing Machinery; 2024. p. 4082–4091. Available from: https://doi. org/10.1145/3664647.3680944

work page doi:10.1145/3664647.3680944 2024
[35]

Focus On Details: Online Multi- Object Tracking with Diverse Fine-Grained Representation

Ren H, Han S, Ding H, Zhang Z, Wang H, Wang F. Focus On Details: Online Multi- Object Tracking with Diverse Fine-Grained Representation. In: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR); 2023. p. 11289–11298. 15

work page 2023
[36]

Learning by tracking: Siamese CNN for robust target association

Leal-Taix´ e L, Canton-Ferrer C, Schindler K. Learning by tracking: Siamese CNN for robust target association. CoRR. 2016;abs/1604.07866. 1604.07866

work page internal anchor Pith review Pith/arXiv arXiv 2016
[37]

BoT- SORT: Robust associations multi-pedestrian tracking

Aharon N, Orfaig R, Bobrovsky BZ.: BoT- SORT: Robust Associations Multi-Pedestrian Tracking. Available from: https://arxiv.org/ abs/2206.14651

work page arXiv
[38]

Available from: https://arxiv.org/abs/2206.04656

Seidenschwarz J, Bras´ o G, Serrano VC, Elezi I, Leal-Taix´ e L.: Simple Cues Lead to a Strong Multi-Object Tracker. Available from: https://arxiv.org/abs/2206.04656

work page arXiv
[39]

End-to- End Object Detection with Transform- ers

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to- End Object Detection with Transform- ers. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Berlin, Heidelberg: Springer-Verlag; 2020. p. 213–229. Available from: https://doi.org/10. 1007/978-3-030-58452-8 13

work page 2020
[40]

TrackFormer: Multi-Object Tracking with Transformers

Meinhardt T, Kirillov A, Leal-Taixe L, Feichtenhofer C. TrackFormer: Multi-Object Tracking with Transformers. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2022

work page 2022
[41]

MeMOT: Multi-Object Tracking with Memory

Cai J, Xu M, Li W, Xiong Y, Xia W, Tu Z, et al. MeMOT: Multi-Object Tracking with Memory. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. p. 8080–8090

work page 2022
[42]

Tracking without bells and whistles

Bergmann P, Meinhardt T, Leal-Taix´ e L. Tracking without bells and whistles. In: The IEEE International Conference on Computer Vision (ICCV); 2019

work page 2019
[43]

How To Train Your Deep Multi-Object Tracker

Xu Y, Osep A, Ban Y, Horaud R, Leal- Taix´ e L, Alameda-Pineda X. How To Train Your Deep Multi-Object Tracker. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page
[44]

Strongsort: Make deepsort great again

Du Y, Zhao Z, Song Y, Zhao Y, Su F, Gong T, et al. Strongsort: Make deepsort great again. IEEE Transactions on Multimedia. 2023

work page 2023
[45]

Global Tracking Transformers

Zhou X, Yin T, Koltun V, Kr¨ ahenb¨ uhl P. Global Tracking Transformers. In: CVPR

work page
[46]

Multiple object tracking using k- shortest paths optimization

Berclaz J, Fleuret F, Turetken E, Fua P. Multiple object tracking using k- shortest paths optimization. IEEE TPAMI. 2011;33(9):1806–1819

work page 2011
[47]

Global data associ- ation for multi-object tracking using network flows

Zhang L, Li Y, Nevatia R. Global data associ- ation for multi-object tracking using network flows. In: CVPR; 2008

work page 2008
[48]

Multiple People Tracking by Lifted Multi- cut and Person Re-Identification

Tang S, Andriluka M, Andres B, Schiele B. Multiple People Tracking by Lifted Multi- cut and Person Re-Identification. In: CVPR

work page
[49]

Gmcp- tracker: Global multi-object tracking using generalized minimum clique graphs

Zamir AR, Dehghan A, Shah M. Gmcp- tracker: Global multi-object tracking using generalized minimum clique graphs. In: ECCV. Springer; 2012. p. 343–356

work page 2012
[50]

Subgraph Decomposition for Multi-Target Tracking

Tang S, Andres B, Andriluka M, Schiele B. Subgraph Decomposition for Multi-Target Tracking. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR); 2015

work page 2015
[51]

Lifted disjoint paths with appli- cation in multiple object tracking

Hornakova A, Henschel R, Rosenhahn B, Swoboda P. Lifted disjoint paths with appli- cation in multiple object tracking. In: ICML. PMLR; 2020. p. 4364–4375

work page 2020
[52]

Making Higher Order MOT Scalable: An Efficient Approx- imate Solver for Lifted Disjoint Paths

Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R. Making Higher Order MOT Scalable: An Efficient Approx- imate Solver for Lifted Disjoint Paths. In: ICCV; 2021. p. 6330–6340

work page 2021
[53]

Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow

Butt A, Collins R. Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow. CVPR. 2013

work page 2013
[54]

End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models

Xiang J, Xu G, Ma C, Hou J. End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models. IEEE Trans- actions on Circuits and Systems for Video Technology. 2021;31(1):275–288. https://doi. org/10.1109/TCSVT.2020.2975842. 16

work page doi:10.1109/tcsvt.2020.2975842 2021
[55]

Multi-Object Tracking Using Color, Texture and Motion

Takala V, Pietikainen M. Multi-Object Tracking Using Color, Texture and Motion. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition; 2007. p. 1–7

work page 2007
[56]

Learning by Tracking: Siamese CNN for Robust Target Association

Leal-Taixe L, Canton-Ferrer C, Schindler K. Learning by Tracking: Siamese CNN for Robust Target Association. In: CVPRW

work page
[57]

Multi- Object Tracking With Quadruplet Convolu- tional Neural Networks

Son J, Baek M, Cho M, Han B. Multi- Object Tracking With Quadruplet Convolu- tional Neural Networks. In: CVPR; 2017

work page 2017
[58]

Features for Multi- Target Multi-Camera Tracking and Re- Identification

Ristani E, Tomasi C. Features for Multi- Target Multi-Camera Tracking and Re- Identification. In: CVPR; 2018

work page 2018
[59]

Tracking the Untrackable: Learning to Track Multi- ple Cues With Long-Term Dependencies

Sadeghian A, Alahi A, Savarese S. Tracking the Untrackable: Learning to Track Multi- ple Cues With Long-Term Dependencies. In: ICCV; 2017

work page 2017
[60]

Online Multi-Target Track- ing Using Recurrent Neural Networks

Milan A, Rezatofighi SH, Dick A, Reid I, Schindler K. Online Multi-Target Track- ing Using Recurrent Neural Networks. In: Proceedings of the Thirty-First AAAI Con- ference on Artificial Intelligence; 2017

work page 2017
[61]

Learning a Proposal Classifier for Mul- tiple Object Tracking

Dai P, Weng R, Choi W, Zhang C, He Z, Ding W. Learning a Proposal Classifier for Mul- tiple Object Tracking. In: CVPR; 2021. p. 2443–2452

work page 2021
[62]

Learn- able graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking

He J, Huang Z, Wang N, Zhang Z. Learn- able graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In: CVPR; 2021. p. 5299–5309

work page 2021
[63]

Graph Networks for Multiple Object Tracking

Li J, Gao X, Jiang T. Graph Networks for Multiple Object Tracking. In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision (WACV); 2020

work page 2020
[64]

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi- Feature Learning

Weng X, Wang Y, Man Y, Kitani KM. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi- Feature Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020

work page 2020
[65]

GSM: Graph Similarity Model for Multi-Object Tracking

Liu Q, Chu Q, Liu B, Yu N. GSM: Graph Similarity Model for Multi-Object Tracking. In: IJCAI; 2020. p. 530–536

work page 2020
[66]

Multi- Object Tracking and Segmentation Via Neu- ral Message Passing

Bras´ o G, Cetintas O, Leal-Taix´ e L. Multi- Object Tracking and Segmentation Via Neu- ral Message Passing. International Journal of Computer Vision. 2022;130(12):3035–3053

work page 2022
[67]

2021 , url =

Wang Y, Kitani K, Weng X. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press; 2021. p. 13708–13715. Available from: https://doi. org/10.1109/ICRA48506.2021.9561110

work page doi:10.1109/icra48506.2021.9561110 2021
[68]

Graph Networks for Multiple Object Tracking

Li J, Gao X, Jiang T. Graph Networks for Multiple Object Tracking. In: 2020 IEEE Winter Conference on Applications of Com- puter Vision (WACV); 2020. p. 708–717

work page 2020
[69]

The Hungarian Method for the Assignment Problem

Kuhn HW. The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly. 1955 March;2(1–2):83–

work page 1955
[70]

https://doi.org/10.1002/nav.3800020109

work page doi:10.1002/nav.3800020109
[71]

Learnable Graph Matching: Incorporating Graph Parti- tioning With Deep Feature Learning for Mul- tiple Object Tracking

He J, Huang Z, Wang N, Zhang Z. Learnable Graph Matching: Incorporating Graph Parti- tioning With Deep Feature Learning for Mul- tiple Object Tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021. p. 5299–5309

work page 2021
[72]

Global data associ- ation for multi-object tracking using network flows

Zhang L, Li Y, Nevatia R. Global data associ- ation for multi-object tracking using network flows. In: 2008 IEEE Conference on Com- puter Vision and Pattern Recognition. IEEE

work page 2008
[73]

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Sun P, Cao J, Jiang Y, Yuan Z, Bai S, Kitani K, et al. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. . 17

work page 2022
[74]

SportsMOT: A Large Multi-Object Track- ing Dataset in Multiple Sports Scenes

Cui Y, Zeng C, Zhao X, Yang Y, Wu G, Wang L. SportsMOT: A Large Multi-Object Track- ing Dataset in Multiple Sports Scenes. arXiv preprint arXiv:230405170. 2023

work page 2023
[75]

Available from: https://arxiv

Dendorfer P, Oˇ sep A, Milan A, Schindler K, Cremers D, Reid I, et al.: MOTChallenge: A Benchmark for Single-Camera Multiple Tar- get Tracking. Available from: https://arxiv. org/abs/2010.07548

work page arXiv 2010
[76]

MOT20: A benchmark for multi object tracking in crowded scenes

Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:200309003[cs]. 2020 Mar;ArXiv: 2003.09003

work page arXiv 2020
[77]

HOTA: A Higher Order Metric for Evaluating Multi- object Tracking

Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taix´ e L, et al. HOTA: A Higher Order Metric for Evaluating Multi- object Tracking. Int J Comput Vision. 2021 Feb;129(2):548–578. https://doi.org/10. 1007/s11263-020-01375-2

work page 2021
[78]

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

Ristani E, Solera F, Zou RS, Cucchiara R, Tomasi C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. CoRR. 2016;abs/1609.01775. 1609.01775

work page internal anchor Pith review Pith/arXiv arXiv 2016
[79]

Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Track- ing in Video: Data, Metrics, and Proto- col

Kasturi R, Goldgof D, Soundararajan P, Manohar V, Garofolo J, Bowers R, et al. Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Track- ing in Video: Data, Metrics, and Proto- col. IEEE Transactions on Pattern Analy- sis and Machine Intelligence. 2009;31(2):319–

work page 2009
[80]

https://doi.org/10.1109/TPAMI.2008. 57

work page doi:10.1109/tpami.2008 2008

Showing first 80 references.

[1] [1]

Ding S, Schneider L, Cordts M, Gall J.: ADA-Track++: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

work page

[2] [2]

Explor- ing Simple 3D Multi-Object Tracking for Autonomous Driving

Luo C, Yang X, Yuille A. Explor- ing Simple 3D Multi-Object Tracking for Autonomous Driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021. p. 10488– 10497

work page 2021

[3] [3]

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learn- ing

Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, et al. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learn- ing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page

[4] [4]

SPAMming Labels: Efficient Anno- tations for the Trackers of Tomorrow

Cetintas O, Meinhardt T, Bras´ o G, Leal- Taix´ e L. SPAMming Labels: Efficient Anno- tations for the Trackers of Tomorrow. In: European Conference on Computer Vision (ECCV); 2024

work page 2024

[5] [5]

Effi- ciently Scaling Up Video Annotation with Crowdsourced Marketplaces

Vondrick C, Ramanan D, Patterson D. Effi- ciently Scaling Up Video Annotation with Crowdsourced Marketplaces. In: Daniilidis K, Maragos P, Paragios N, editors. Computer Vision – ECCV 2010. Berlin, Heidelberg: Springer Berlin Heidelberg; 2010. p. 610–623

work page 2010

[6] [6]

DiffMOT: A Real-time Diffusion- based Multiple Object Tracker with Non- linear Prediction

Lv W, Huang Y, Zhang N, Lin RS, Han M, Zeng D. DiffMOT: A Real-time Diffusion- based Multiple Object Tracker with Non- linear Prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024. p. 19321– 19330

work page 2024

[7] [7]

Available from: https:// arxiv.org/abs/2303.10404

Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W.: MotionTrack: Learning Robust Short-term and Long-term Motions for Multi- Object Tracking. Available from: https:// arxiv.org/abs/2303.10404

work page arXiv

[8] [8]

Observation-centric sort: Rethinking sort for robust multi-object tracking

Cao J, Pang J, Weng X, Khirodkar R, Kitani K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page

[9] [9]

Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? In: Oh AH, Agarwal A, Belgrave D, Cho K, editors

Dendorfer P, Yugay V, Osep A, Leal-Taix´ e L. Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? In: Oh AH, Agarwal A, Belgrave D, Cho K, editors. Advances in Neural Information Pro- cessing Systems; 2022. Available from: https: //openreview.net/forum?id=3r0yLLCo4fF

work page 2022

[10] [10]

Learning an Image-Based Motion Context for Multiple People Track- ing

Leal-Taix´ e L, Fenzi M, Kuznetsova A, Rosen- hahn B, Savarese S. Learning an Image-Based Motion Context for Multiple People Track- ing. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 3542–3549

work page 2014

[11] [11]

Available from: https://arxiv.org/abs/2003.08177

Wang G, Yang S, Liu H, Wang Z, Yang Y, Wang S, et al.: High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification. Available from: https://arxiv.org/abs/2003.08177

work page arXiv 2003

[12] [12]

Unsupervised Pre-training for Person Re-identification

Fu D, Chen D, Bao J, Yang H, Yuan L, Zhang L, et al. Unsupervised Pre-training for Person Re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition. 2021

work page 2021

[13] [13]

TransReID: Transformer-Based Object Re-Identification

He S, Luo H, Wang P, Wang F, Li H, Jiang W. TransReID: Transformer-Based Object Re-Identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2021. p. 15013– 15022

work page 2021

[14] [14]

Key- point Promptable Re-Identification

Somers V, Alahi A, Vleeschouwer CD. Key- point Promptable Re-Identification. In: Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, Septem- ber 29-October 4, 2024, Proceedings, Part LXXIX. vol. 15137 of Lecture Notes in Computer Science. Springer; 2024. p. 216–

work page 2024

[15] [15]

1007/978-3-031-72986-7 13

Available from: https://doi.org/10. 1007/978-3-031-72986-7 13

work page

[16] [16]

Simple online and realtime tracking

Bewley A, Ge Z, Ott L, Ramos F, Upcroft B. Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP); 2016. p. 3464–3468

work page 2016

[17] [17]

Simple Online and Realtime Tracking with a Deep Association Metric

Wojke N, Bewley A, Paulus D. Simple Online and Realtime Tracking with a Deep Association Metric. In: 2017 IEEE Inter- national Conference on Image Processing (ICIP). IEEE; 2017. p. 3645–3649

work page 2017

[18] [18]

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, et al. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. Proceedings of the European Conference on Computer Vision (ECCV). 2022

work page 2022

[19] [19]

Hybrid-sort: Weak cues matter for online multi-object tracking

Yang M, Han G, Yan B, Zhang W, Qi J, Lu H, et al. Hybrid-sort: Weak cues matter for online multi-object tracking. In: Proceed- ings of the AAAI Conference on Artificial Intelligence. vol. 38; 2024. p. 6504–6512

work page 2024

[20] [20]

MOTR: End-to-End Multiple- Object Tracking with Transformer

Zeng F, Dong B, Zhang Y, Wang T, Zhang X, Wei Y. MOTR: End-to-End Multiple- Object Tracking with Transformer. In: 14 Computer Vision – ECCV 2022: 17th Euro- pean Conference, Tel Aviv, Israel, Octo- ber 23–27, 2022, Proceedings, Part XXVII. Berlin, Heidelberg: Springer-Verlag; 2022. p. 659–675. Available from: https://doi.org/10. 1007/978-3-031-19812-0 38

work page 2022

[21] [21]

MOTRv2: Boot- strapping End-to-End Multi-Object Tracking by Pretrained Object Detectors

Zhang Y, Wang T, Zhang X. MOTRv2: Boot- strapping End-to-End Multi-Object Tracking by Pretrained Object Detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

work page 2023

[22] [22]

& Chen, C

p. 22056–22065. Available from: http:// dx.doi.org/10.1109/CVPR52729.2023.02112

work page doi:10.1109/cvpr52729.2023.02112 2023

[23] [23]

MeMOTR: Long- Term Memory-Augmented Transformer for Multi-Object Tracking

Gao R, Wang L. MeMOTR: Long- Term Memory-Augmented Transformer for Multi-Object Tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2023. p. 9901– 9910

work page 2023

[24] [24]

Available from: https://arxiv.org/abs/2403.16848

Gao R, Qi J, Wang L.: Multiple Object Tracking as ID Prediction. Available from: https://arxiv.org/abs/2403.16848

work page arXiv

[25] [25]

Learning a Neural Solver for Multiple Object Tracking

Bras´ o G, Leal-Taix´ e L. Learning a Neural Solver for Multiple Object Tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2020

work page 2020

[26] [26]

Multi- Object Tracking and Segmentation Via Neu- ral Message Passing

Bras´ o G, Cetintas O, Leal-Taix´ e L. Multi- Object Tracking and Segmentation Via Neu- ral Message Passing. International Journal of Computer Vision. 2022;https://doi.org/10. 1007/s11263-022-01678-6

work page 2022

[27] [27]

Uni- fying Short and Long-Term Tracking With Graph Hierarchies

Cetintas O, Bras´ o G, Leal-Taix´ e L. Uni- fying Short and Long-Term Tracking With Graph Hierarchies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023. p. 22877–22887

work page 2023

[28] [28]

Multi- scene generalized trajectory global graph solver with composite nodes for multiple object tracking

Gao Y, Xu H, Li J, Wang N, Gao X. Multi- scene generalized trajectory global graph solver with composite nodes for multiple object tracking. In: Proceedings of the Thirty-Eighth AAAI Conference on Artifi- cial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances...

work page doi:10.1609/aaai.v38i3.27953 2024

[29] [29]

The Architectural Implications of Autonomous Driving: Con- straints and Acceleration

Lin SC, Zhang Y, Hsu CH, Skach M, Haque ME, Tang L, et al. The Architectural Implications of Autonomous Driving: Con- straints and Acceleration. SIGPLAN Not. 2018 Mar;53(2):751–766. https://doi.org/10. 1145/3296957.3173191

work page arXiv 2018

[30] [30]

VETRA: A Dataset for Vehicle Tracking in Aerial Imagery – New Challenges for Multi-Object Tracking

Hellekes J, M¨ uhlhaus M, Bahmanyar R, Azimi SM, Kurz F. VETRA: A Dataset for Vehicle Tracking in Aerial Imagery – New Challenges for Multi-Object Tracking. In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, Septem- ber 29–October 4, 2024, Proceedings, Part LXXXV. Berlin, Heidelberg: Springer-Verlag

work page 2024

[31] [31]

p. 52–70. Available from: https://doi. org/10.1007/978-3-031-73013-9 4

work page doi:10.1007/978-3-031-73013-9

[32] [32]

KIT - Insti- tute of Photogrammetry and Remote Sensing (IPF)

Schmidt F.: Data Set for Tracking Vehicles in Aerial Image Sequences. KIT - Insti- tute of Photogrammetry and Remote Sensing (IPF). https://www.ipf.kit.edu/downloads data set AIS vehicle tracking.php

work page

[33] [33]

Available from: https://arxiv.org/ abs/2405.15755

Han X, Oishi N, Tian Y, Ucurum E, Young R, Chatwin C, et al.: ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking. Available from: https://arxiv.org/ abs/2405.15755

work page arXiv

[34] [34]

MambaTrack: A Simple Baseline for Multiple Object Track- ing with State Space Model

Xiao C, Cao Q, Luo Z, Lan L. MambaTrack: A Simple Baseline for Multiple Object Track- ing with State Space Model. In: Proceedings of the 32nd ACM International Conference on Multimedia. MM ’24. New York, NY, USA: Association for Computing Machinery; 2024. p. 4082–4091. Available from: https://doi. org/10.1145/3664647.3680944

work page doi:10.1145/3664647.3680944 2024

[35] [35]

Focus On Details: Online Multi- Object Tracking with Diverse Fine-Grained Representation

Ren H, Han S, Ding H, Zhang Z, Wang H, Wang F. Focus On Details: Online Multi- Object Tracking with Diverse Fine-Grained Representation. In: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR); 2023. p. 11289–11298. 15

work page 2023

[36] [36]

Learning by tracking: Siamese CNN for robust target association

Leal-Taix´ e L, Canton-Ferrer C, Schindler K. Learning by tracking: Siamese CNN for robust target association. CoRR. 2016;abs/1604.07866. 1604.07866

work page internal anchor Pith review Pith/arXiv arXiv 2016

[37] [37]

BoT- SORT: Robust associations multi-pedestrian tracking

Aharon N, Orfaig R, Bobrovsky BZ.: BoT- SORT: Robust Associations Multi-Pedestrian Tracking. Available from: https://arxiv.org/ abs/2206.14651

work page arXiv

[38] [38]

Available from: https://arxiv.org/abs/2206.04656

Seidenschwarz J, Bras´ o G, Serrano VC, Elezi I, Leal-Taix´ e L.: Simple Cues Lead to a Strong Multi-Object Tracker. Available from: https://arxiv.org/abs/2206.04656

work page arXiv

[39] [39]

End-to- End Object Detection with Transform- ers

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to- End Object Detection with Transform- ers. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I. Berlin, Heidelberg: Springer-Verlag; 2020. p. 213–229. Available from: https://doi.org/10. 1007/978-3-030-58452-8 13

work page 2020

[40] [40]

TrackFormer: Multi-Object Tracking with Transformers

Meinhardt T, Kirillov A, Leal-Taixe L, Feichtenhofer C. TrackFormer: Multi-Object Tracking with Transformers. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2022

work page 2022

[41] [41]

MeMOT: Multi-Object Tracking with Memory

Cai J, Xu M, Li W, Xiong Y, Xia W, Tu Z, et al. MeMOT: Multi-Object Tracking with Memory. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. p. 8080–8090

work page 2022

[42] [42]

Tracking without bells and whistles

Bergmann P, Meinhardt T, Leal-Taix´ e L. Tracking without bells and whistles. In: The IEEE International Conference on Computer Vision (ICCV); 2019

work page 2019

[43] [43]

How To Train Your Deep Multi-Object Tracker

Xu Y, Osep A, Ban Y, Horaud R, Leal- Taix´ e L, Alameda-Pineda X. How To Train Your Deep Multi-Object Tracker. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

work page

[44] [44]

Strongsort: Make deepsort great again

Du Y, Zhao Z, Song Y, Zhao Y, Su F, Gong T, et al. Strongsort: Make deepsort great again. IEEE Transactions on Multimedia. 2023

work page 2023

[45] [45]

Global Tracking Transformers

Zhou X, Yin T, Koltun V, Kr¨ ahenb¨ uhl P. Global Tracking Transformers. In: CVPR

work page

[46] [46]

Multiple object tracking using k- shortest paths optimization

Berclaz J, Fleuret F, Turetken E, Fua P. Multiple object tracking using k- shortest paths optimization. IEEE TPAMI. 2011;33(9):1806–1819

work page 2011

[47] [47]

Global data associ- ation for multi-object tracking using network flows

Zhang L, Li Y, Nevatia R. Global data associ- ation for multi-object tracking using network flows. In: CVPR; 2008

work page 2008

[48] [48]

Multiple People Tracking by Lifted Multi- cut and Person Re-Identification

Tang S, Andriluka M, Andres B, Schiele B. Multiple People Tracking by Lifted Multi- cut and Person Re-Identification. In: CVPR

work page

[49] [49]

Gmcp- tracker: Global multi-object tracking using generalized minimum clique graphs

Zamir AR, Dehghan A, Shah M. Gmcp- tracker: Global multi-object tracking using generalized minimum clique graphs. In: ECCV. Springer; 2012. p. 343–356

work page 2012

[50] [50]

Subgraph Decomposition for Multi-Target Tracking

Tang S, Andres B, Andriluka M, Schiele B. Subgraph Decomposition for Multi-Target Tracking. In: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR); 2015

work page 2015

[51] [51]

Lifted disjoint paths with appli- cation in multiple object tracking

Hornakova A, Henschel R, Rosenhahn B, Swoboda P. Lifted disjoint paths with appli- cation in multiple object tracking. In: ICML. PMLR; 2020. p. 4364–4375

work page 2020

[52] [52]

Making Higher Order MOT Scalable: An Efficient Approx- imate Solver for Lifted Disjoint Paths

Hornakova A, Kaiser T, Swoboda P, Rolinek M, Rosenhahn B, Henschel R. Making Higher Order MOT Scalable: An Efficient Approx- imate Solver for Lifted Disjoint Paths. In: ICCV; 2021. p. 6330–6340

work page 2021

[53] [53]

Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow

Butt A, Collins R. Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow. CVPR. 2013

work page 2013

[54] [54]

End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models

Xiang J, Xu G, Ma C, Hou J. End-to-End Learning Deep CRF Models for Multi-Object Tracking Deep CRF Models. IEEE Trans- actions on Circuits and Systems for Video Technology. 2021;31(1):275–288. https://doi. org/10.1109/TCSVT.2020.2975842. 16

work page doi:10.1109/tcsvt.2020.2975842 2021

[55] [55]

Multi-Object Tracking Using Color, Texture and Motion

Takala V, Pietikainen M. Multi-Object Tracking Using Color, Texture and Motion. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition; 2007. p. 1–7

work page 2007

[56] [56]

Learning by Tracking: Siamese CNN for Robust Target Association

Leal-Taixe L, Canton-Ferrer C, Schindler K. Learning by Tracking: Siamese CNN for Robust Target Association. In: CVPRW

work page

[57] [57]

Multi- Object Tracking With Quadruplet Convolu- tional Neural Networks

Son J, Baek M, Cho M, Han B. Multi- Object Tracking With Quadruplet Convolu- tional Neural Networks. In: CVPR; 2017

work page 2017

[58] [58]

Features for Multi- Target Multi-Camera Tracking and Re- Identification

Ristani E, Tomasi C. Features for Multi- Target Multi-Camera Tracking and Re- Identification. In: CVPR; 2018

work page 2018

[59] [59]

Tracking the Untrackable: Learning to Track Multi- ple Cues With Long-Term Dependencies

Sadeghian A, Alahi A, Savarese S. Tracking the Untrackable: Learning to Track Multi- ple Cues With Long-Term Dependencies. In: ICCV; 2017

work page 2017

[60] [60]

Online Multi-Target Track- ing Using Recurrent Neural Networks

Milan A, Rezatofighi SH, Dick A, Reid I, Schindler K. Online Multi-Target Track- ing Using Recurrent Neural Networks. In: Proceedings of the Thirty-First AAAI Con- ference on Artificial Intelligence; 2017

work page 2017

[61] [61]

Learning a Proposal Classifier for Mul- tiple Object Tracking

Dai P, Weng R, Choi W, Zhang C, He Z, Ding W. Learning a Proposal Classifier for Mul- tiple Object Tracking. In: CVPR; 2021. p. 2443–2452

work page 2021

[62] [62]

Learn- able graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking

He J, Huang Z, Wang N, Zhang Z. Learn- able graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In: CVPR; 2021. p. 5299–5309

work page 2021

[63] [63]

Graph Networks for Multiple Object Tracking

Li J, Gao X, Jiang T. Graph Networks for Multiple Object Tracking. In: Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision (WACV); 2020

work page 2020

[64] [64]

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi- Feature Learning

Weng X, Wang Y, Man Y, Kitani KM. GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi- Feature Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020

work page 2020

[65] [65]

GSM: Graph Similarity Model for Multi-Object Tracking

Liu Q, Chu Q, Liu B, Yu N. GSM: Graph Similarity Model for Multi-Object Tracking. In: IJCAI; 2020. p. 530–536

work page 2020

[66] [66]

Multi- Object Tracking and Segmentation Via Neu- ral Message Passing

Bras´ o G, Cetintas O, Leal-Taix´ e L. Multi- Object Tracking and Segmentation Via Neu- ral Message Passing. International Journal of Computer Vision. 2022;130(12):3035–3053

work page 2022

[67] [67]

2021 , url =

Wang Y, Kitani K, Weng X. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press; 2021. p. 13708–13715. Available from: https://doi. org/10.1109/ICRA48506.2021.9561110

work page doi:10.1109/icra48506.2021.9561110 2021

[68] [68]

Graph Networks for Multiple Object Tracking

Li J, Gao X, Jiang T. Graph Networks for Multiple Object Tracking. In: 2020 IEEE Winter Conference on Applications of Com- puter Vision (WACV); 2020. p. 708–717

work page 2020

[69] [69]

The Hungarian Method for the Assignment Problem

Kuhn HW. The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly. 1955 March;2(1–2):83–

work page 1955

[70] [70]

https://doi.org/10.1002/nav.3800020109

work page doi:10.1002/nav.3800020109

[71] [71]

Learnable Graph Matching: Incorporating Graph Parti- tioning With Deep Feature Learning for Mul- tiple Object Tracking

He J, Huang Z, Wang N, Zhang Z. Learnable Graph Matching: Incorporating Graph Parti- tioning With Deep Feature Learning for Mul- tiple Object Tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021. p. 5299–5309

work page 2021

[72] [72]

Global data associ- ation for multi-object tracking using network flows

Zhang L, Li Y, Nevatia R. Global data associ- ation for multi-object tracking using network flows. In: 2008 IEEE Conference on Com- puter Vision and Pattern Recognition. IEEE

work page 2008

[73] [73]

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Sun P, Cao J, Jiang Y, Yuan Z, Bai S, Kitani K, et al. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022. . 17

work page 2022

[74] [74]

SportsMOT: A Large Multi-Object Track- ing Dataset in Multiple Sports Scenes

Cui Y, Zeng C, Zhao X, Yang Y, Wu G, Wang L. SportsMOT: A Large Multi-Object Track- ing Dataset in Multiple Sports Scenes. arXiv preprint arXiv:230405170. 2023

work page 2023

[75] [75]

Available from: https://arxiv

Dendorfer P, Oˇ sep A, Milan A, Schindler K, Cremers D, Reid I, et al.: MOTChallenge: A Benchmark for Single-Camera Multiple Tar- get Tracking. Available from: https://arxiv. org/abs/2010.07548

work page arXiv 2010

[76] [76]

MOT20: A benchmark for multi object tracking in crowded scenes

Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:200309003[cs]. 2020 Mar;ArXiv: 2003.09003

work page arXiv 2020

[77] [77]

HOTA: A Higher Order Metric for Evaluating Multi- object Tracking

Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taix´ e L, et al. HOTA: A Higher Order Metric for Evaluating Multi- object Tracking. Int J Comput Vision. 2021 Feb;129(2):548–578. https://doi.org/10. 1007/s11263-020-01375-2

work page 2021

[78] [78]

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

Ristani E, Solera F, Zou RS, Cucchiara R, Tomasi C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. CoRR. 2016;abs/1609.01775. 1609.01775

work page internal anchor Pith review Pith/arXiv arXiv 2016

[79] [79]

Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Track- ing in Video: Data, Metrics, and Proto- col

Kasturi R, Goldgof D, Soundararajan P, Manohar V, Garofolo J, Bowers R, et al. Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Track- ing in Video: Data, Metrics, and Proto- col. IEEE Transactions on Pattern Analy- sis and Machine Intelligence. 2009;31(2):319–

work page 2009

[80] [80]

https://doi.org/10.1109/TPAMI.2008. 57

work page doi:10.1109/tpami.2008 2008