MOT16: A Benchmark for Multi-Object Tracking

Anton Milan , Laura Leal-Taixe , Ian Reid , Stefan Roth , Konrad Schindler

Authors on Pith no claims yet

classification 💻 cs.CV

keywords objecttrackingbenchmarkmultiplereleasebenchmarksmot16motchallenge

read the original abstract

Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for reseach. Recently, a new benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal of collecting existing and new data and creating a framework for the standardized evaluation of multiple object tracking methods. The first release of the benchmark focuses on multiple people tracking, since pedestrians are by far the most studied object in the tracking community. This paper accompanies a new release of the MOTChallenge benchmark. Unlike the initial release, all videos of MOT16 have been carefully annotated following a consistent protocol. Moreover, it not only offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Clip-level Uncertainty and Temporal-aware Active Learning for End-to-End Multi-Object Tracking
cs.CV 2026-05 unverdicted novelty 7.0

CUTAL scores multi-frame clips for uncertainty and enforces temporal diversity to train transformer MOT models to near full-supervision performance with 50% of the labels.
CityOS: Privacy Architecture for Urban Sensing
cs.OS 2026-05 unverdicted novelty 7.0

CityOS is an edge runtime that enforces a three-tier privacy API for urban sensors: local raw data, differentially private single-location stats, and cross-location aggregates with per-user budgets enforced on devices.
Learned Nonlocal Feature Matching and Filtering for RAW Image Denoising
eess.IV 2026-04 unverdicted novelty 7.0

A learnable nonlocal block that mimics classical neighbor matching and collaborative filtering on multiscale features produces competitive RAW denoising with far fewer parameters than current deep models and generaliz...
Towards Unconstrained Human-Object Interaction
cs.CV 2026-04 unverdicted novelty 7.0

Introduces the U-HOI task and shows MLLMs plus a language-to-graph pipeline can handle human-object interactions without any predefined vocabulary at training or inference time.
STORM: End-to-End Referring Multi-Object Tracking in Videos
cs.CV 2026-04 unverdicted novelty 7.0

STORM is an end-to-end MLLM for referring multi-object tracking that uses task-composition learning to leverage sub-task data and introduces the STORM-Bench dataset, achieving SOTA results.
ERPPO: Entropy Regularization-based Proximal Policy Optimization
cs.LG 2026-05 unverdicted novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
SAMOFT: Robust Multi-Object Tracking via Region and Flow
cs.CV 2026-05 unverdicted novelty 5.0

SAMOFT improves multi-object tracking by using SAM segmentation and optical flow for pixel-level motion matching, flexible centroid correction, and training-free motion pattern fixes on top of standard Kalman and ReID...
Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking
cs.CV 2026-05 unverdicted novelty 5.0

TCMP achieves SOTA MOT metrics (HOTA 63.4%, IDF1 65.0%, AssA 49.1%) with 0.014x parameters and 0.05x FLOPs of the previous best method by using a simple dilated TCN regressor.
Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics
cs.CV 2026-04 unverdicted novelty 5.0

Distilled SAM 3 and DINOv3 models deliver near-teacher accuracy in pig tracking (92.29% MOTA, 96.15% IDF1) and behavior classification while achieving 7.77x parameter reduction and fitting on Jetson Orin NX with headroom.
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
cs.CV 2026-04 unverdicted novelty 5.0

HyperSSM integrates hypergraphs and state space models to let correlated objects mutually refine motion estimates, stabilizing trajectories under noise and occlusion for state-of-the-art multi-object tracking.
Attention Is not Everything: Efficient Alternatives for Vision
cs.CV 2026-04 unverdicted novelty 3.0

A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
Intelligent Traffic Monitoring with YOLOv11: A Case Study in Real-Time Vehicle Detection
cs.CV 2026-04 unverdicted novelty 3.0

A YOLOv11-based desktop application detects and counts vehicles in traffic videos with 67-96% accuracy and high F1 scores for cars and trucks.