CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

Ahmed Durmush; Alan Luke\v{z}i\v{c}; Jani K\"apyl\"a; Ji\v{r}\'i Matas; Joni-Kristian K\"am\"ar\"ainen; Matej Kristan; Ugur Kart

arxiv: 1907.00618 · v1 · pith:KEVUO6OMnew · submitted 2019-07-01 · 💻 cs.CV

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

Alan Luke\v{z}i\v{c} , Ugur Kart , Jani K\"apyl\"a , Ahmed Durmush , Joni-Kristian K\"am\"ar\"ainen , Ji\v{r}\'i Matas , Matej Kristan This is my paper

Pith reviewed 2026-05-25 12:06 UTC · model grok-4.3

classification 💻 cs.CV

keywords visual object trackinglong-term trackingperformance measuresbenchmark datasetcolor and depthre-detectiontracking taxonomy

0 comments

The pith

New performance measures for long-term tracking generalize short-term metrics and remain robust to sparse annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a long-term visual object tracking evaluation methodology and benchmark. New performance measures are designed to maximize analysis probing strength, outperforming prior ones by better distinguishing tracking behaviors and offering greater interpretation potential. These measures generalize short-term performance measures, linking the two problems, while remaining highly robust to temporal annotation sparsity. This robustness allows annotation of sequences hundreds of times longer than existing datasets without added manual effort. A new challenging dataset of sequences with many target disappearances, using color and depth, is introduced along with a taxonomy to position trackers on the short-term to long-term spectrum.

Core claim

Following a long-term tracking definition, the authors design performance measures that provide stronger analysis, outperform existing ones in interpretation and behavior distinction, generalize short-term measures to link the problems, and stay robust to annotation sparsity for much longer sequences. The CDTB dataset of carefully selected sequences with frequent target disappearances supports an extensive evaluation of the largest number of long-term trackers, their comparison to short-term state-of-the-art, analysis of architecture implementations, and exploration of re-detection and model update strategies for drift.

What carries the argument

The new long-term performance measures that generalize short-term ones and tolerate annotation sparsity, paired with the CDTB color-and-depth dataset containing many target disappearances.

If this is right

The measures link short-term and long-term tracking problems.
Annotation of sequences hundreds of times longer becomes possible without increasing manual labor.
Influence of tracking architecture implementations on long-term performance can be systematically analyzed.
Re-detection strategies and visual model update strategies can be compared for their effect on long-term tracking drift.
The methodology integrates into the VOT toolkit to automate experimental analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Trackers that add explicit re-detection modules may show measurable gains on sequences with frequent disappearances.
The color and depth modalities together could encourage development of multimodal trackers that maintain identity across occlusions.
The proposed taxonomy may help classify existing trackers and guide creation of hybrid systems that adapt between short-term and long-term modes.
Widespread adoption of the measures could standardize reporting across short-term and long-term tracking papers.

Load-bearing premise

The carefully selected sequences with many target disappearances form a representative and sufficiently challenging test of long-term tracking behavior.

What would settle it

An independent collection of long sequences with target disappearances in which the new measures fail to distinguish tracking behaviors better than prior measures or lose their generalization to short-term metrics.

Figures

Figures reproduced from arXiv: 1907.00618 by Ahmed Durmush, Alan Luke\v{z}i\v{c}, Jani K\"apyl\"a, Ji\v{r}\'i Matas, Joni-Kristian K\"am\"ar\"ainen, Matej Kristan, Ugur Kart.

**Figure 2.** Figure 2: Two of the three sensors used in dataset acquisition: ToF [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The overall tracking performance is presented as tracking [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Tracking precision and recall calculated at the optimal [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Tracking performance w.r.t. visual attributes. The first eleven attributes correspond to scenarios with a visible target (showing F [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: No redetection experiment. Tracking recall is shown on [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term tackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ships a solid new RGB-D long-term tracking dataset and measures, but the superiority claims rest on a narrowly selected set of sequences that may limit how far the results generalize.

read the letter

The main deliverable is CDTB, a color-depth dataset built around sequences with frequent target disappearances, plus new performance measures meant to handle long-term tracking and annotation sparsity better than prior ones. They also add a short-to-long taxonomy and run a large evaluation of trackers, including re-detection and update strategies, all wired into the VOT toolkit. That package is useful on its face for anyone who needs data with depth and explicit long-term challenges. The measures are shown to reduce to short-term ones in the right limits and to stay stable when annotations are thinned out, which is a practical engineering point. The analysis of architecture choices and drift is concrete and points to real implementation issues. The dataset construction itself looks careful and the numbers are reported in enough detail to be checked. The soft spot is exactly the one the stress-test flags: the sequences were chosen for many disappearances, so the claim that the new measures better distinguish behaviors and generalize may not travel outside this collection. If the selection over-represents certain disappearance patterns or scene types, the robustness and interpretation advantages could shrink on other long-term data. The paper does not appear to test the measures on independent long-term corpora to close that loop. This work is aimed at the long-term tracking subgroup in computer vision, especially groups that already use VOT or need RGB-D benchmarks. It is the kind of resource paper that earns referee time because the dataset and code are real outputs others can build on, even if the measure claims need more cross-dataset checks. I would send it to review rather than desk-reject.

Referee Report

1 major / 2 minor

Summary. The paper proposes a long-term visual object tracking evaluation methodology and benchmark, including new performance measures designed to maximize analysis probing strength for long-term scenarios. These measures are claimed to outperform existing ones in interpretation and distinguishing tracking behaviors, while generalizing short-term measures and remaining robust to temporal annotation sparsity. A new challenging dataset CDTB is introduced with sequences featuring many target disappearances, along with a tracking taxonomy positioning trackers on the short-term/long-term spectrum. The work includes extensive evaluation of numerous long-term trackers versus short-term ones, analysis of architectures, re-detection strategies, and model updates, with integration into the VOT toolkit for automated benchmarking.

Significance. If the claims hold, this provides a valuable standardized benchmark and measures for long-term tracking, an area with limited prior resources compared to short-term tracking. The robustness to annotation sparsity and linkage between short- and long-term problems could facilitate scalable evaluation and development of trackers handling disappearances and drift. Credit is due for the empirical scale (largest number of long-term trackers evaluated) and practical integration with the VOT toolkit.

major comments (1)

[Dataset and Benchmark sections] The central claim that the new measures outperform existing ones and generalize short-term measures rests on evaluations using the CDTB dataset of 'carefully selected' sequences. However, without explicit criteria, statistical representativeness analysis, or comparison to broader distributions of long-term tracking challenges (e.g., disappearance patterns or scene types), it is unclear whether superior performance and robustness would transfer beyond this specific data selection.

minor comments (2)

[Abstract] Abstract contains typos: 'tackers' should be 'trackers' and 'tack' should be 'track'.
[Evaluation] Clarify in the results how the new measures were quantitatively compared to prior ones (e.g., specific tables or figures showing interpretation potential and behavior distinction).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of the work. We address the major comment below.

read point-by-point responses

Referee: [Dataset and Benchmark sections] The central claim that the new measures outperform existing ones and generalize short-term measures rests on evaluations using the CDTB dataset of 'carefully selected' sequences. However, without explicit criteria, statistical representativeness analysis, or comparison to broader distributions of long-term tracking challenges (e.g., disappearance patterns or scene types), it is unclear whether superior performance and robustness would transfer beyond this specific data selection.

Authors: We agree that the manuscript would be strengthened by explicitly stating the sequence selection criteria. The CDTB sequences were chosen to emphasize long-term tracking challenges, specifically a high frequency of target disappearances and reappearances (on average several times per sequence), combined with diversity in environments, lighting conditions, and motion patterns while ensuring both color and depth data are available. We will revise the Dataset section to list these criteria in detail. A formal statistical analysis comparing the dataset's disappearance pattern distribution to a hypothetical global distribution of all possible long-term videos is outside the scope of this work, as it would require constructing and annotating a much larger corpus. However, the performance measures themselves are derived directly from the formal definition of long-term tracking (target may disappear and reappear) and are shown both mathematically and empirically to generalize the short-term measures; their robustness to annotation sparsity is validated via controlled subsampling experiments independent of the specific sequence selection. These properties support the broader applicability of the methodology beyond the particular dataset. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark paper with no derivation chain reducing to inputs

full rationale

This is a dataset/benchmark paper that defines new performance measures following a long-term tracking definition, proposes a new dataset of selected sequences, and evaluates trackers empirically. No equations or claims reduce by construction to fitted parameters, self-citations, or renamed inputs. The generalization and robustness claims are shown via experiments on the new data rather than forced by definition. No load-bearing self-citation chains or ansatzes are invoked. This is the expected non-finding for empirical construction work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the new measures rest on the domain assumption that long-term tracking is defined by frequent target disappearances.

pith-pipeline@v0.9.0 · 5765 in / 1059 out tokens · 30777 ms · 2026-05-25T12:06:47.911244+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

An, X.-G

N. An, X.-G. Zhao, and Z.-G. Hou. Online RGB-D Tracking via Detection-Learning-Segmentation. In ICPR, 2016

work page 2016
[2]

Bertinetto, J

L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. Fully-Convolutional Siamese Networks for Ob- ject Tracking. In ECCV Workshops, 2016

work page 2016
[3]

A. Bibi, T. Zhang, and B. Ghanem. 3D Part-Based Sparse Tracker with Automatic Synchronization and Registration. In CVPR, 2016

work page 2016
[4]

D. S. Bolme, J. Beveridge, B. A. Draper, and Y .-M. Lui. Vi- sual Object Tracking using Adaptive Correlation Filters. In CVPR, 2010

work page 2010
[5]

A. Buch, D. Kraft, J.-K. Kamarainen, H. Petersen, and N. Kruger. Pose estimation using local structure-speciﬁc shape and appearance context. In ICRA, 2013

work page 2013
[6]

Camplani, S

M. Camplani, S. Hannuna, M. Mirmehdi, D. Damen, A. Paiement, L. Tao, and T. Burghardt. Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. In BMVC, 2015

work page 2015
[7]

Choi and H

C. Choi and H. Christensen. RGB-d object tracking: A par- ticle ﬁlter approach on GPU. In IROS, 2013

work page 2013
[8]

W. Choi, C. Pantofaru, and S. Savarese. A General Frame- work for Tracking Multiple People from a Moving Camera. IEEE PAMI, 2013

work page 2013
[9]

Dalal and B

N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005

work page 2005
[10]

Danelljan, G

M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Fels- berg. ECO: Efﬁcient Convolution Operators for Tracking. In CVPR, 2017

work page 2017
[11]

A. Ess, B. Leibe, K. Schindler, , and L. van Gool. A Mobile Vision System for Robust Multi-Person Tracking. In CVPR, 2008

work page 2008
[12]

Galoogahi, T

H. Galoogahi, T. Sim, and S. Lucey. Correlation Filters with Limited Boundaries. In CVPR, 2015

work page 2015
[13]

Garcia-Hernando, S

G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim. First- Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. In CVPR, 2018

work page 2018
[14]

Hannuna, M

S. Hannuna, M. Camplani, J. Hall, M. Mirmehdi, D. Damen, T. Burghardt, A. Paiement, and L. Tao. DS-KCF: A Real- time Tracker for RGB-D Data. Journal of Real-Time Image Processing, 2016

work page 2016
[15]

R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Second edition, 2004

work page 2004
[16]

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High- Speed Tracking with Kernelized Correlation Filters. IEEE PAMI, 37(3):583–596, 2015

work page 2015
[17]

Hirschmuller

H. Hirschmuller. Accurate and Efﬁcient Stereo Process- ing by Semi-Global Matching and Mutual Information. In CVPR, 2005

work page 2005
[18]

Kalal, K

Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-Learning- Detection. IEEE PAMI, 34(7):1409–1422, 2011

work page 2011
[19]

Kart, J.-K

U. Kart, J.-K. K ¨am¨ar¨ainen, and J. Matas. How to Make an RGBD Tracker ? In ECCV Workshops, 2018

work page 2018
[20]

Kart, J.-K

U. Kart, J.-K. K ¨am¨ar¨ainen, J. Matas, L. Fan, and F. Cricri. Depth Masked Discriminative Correlation Filter. In ICPR, 2018

work page 2018
[21]

U. Kart, A. Luke ˇziˇc, M. Kristan, J.-K. K ¨am¨ar¨ainen, and J. Matas. Object Tracking by Reconstruction with View- Speciﬁc Discriminative Correlation Filters. InCVPR, 2019

work page 2019
[22]

Kiani Galoogahi, A

H. Kiani Galoogahi, A. Fagg, and S. Lucey. Learning Background-Aware Correlation Filters for Visual Tracking. In ICCV, 2017

work page 2017
[23]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pﬂugfelder, L. ˇCehovin, T. V oj´ır, and et al. The Visual Ob- ject Tracking VOT2016 Challenge Results. In ECCV Work- shops, 2016

work page 2016
[24]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pﬂugfelder, and et al. The Visual Object Tracking VOT2017 Challenge Results. In ICCV Workshops, 2017

work page 2017
[25]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pfugfelder, L. C. Zajc, and T. V . et al. The sixth Visual Object Tracking VOT2018 challenge results. InECCV Work- shops, 2018

work page 2018
[26]

Kristan, J

M. Kristan, J. Matas, A. Leonardis, M. Felsberg, and L. e. a. ˇCehovin Zajc. The Visual Object Tracking VOT2015 Chal- lenge Results. In ICCV Workshops, 2015

work page 2015
[27]

Kristan, J

M. Kristan, J. Matas, G. Nebehay, F. Porikli, and L. ˇCehovin. A Novel Performance Evaluation Methodology for Single- Target Trackers. IEEE PAMI, 38(11):2137–2155, 2016

work page 2016
[28]

Kristan, R

M. Kristan, R. Pﬂugfelder, A. Leonardis, J. Matas, L. ˇCehovin, G. Nebehay, T. V oj´ır, and et al. The Visual Ob- ject Tracking VOT2014 Challenge Results. In ECCV Work- shops, 2014

work page 2014
[29]

Kristan, R

M. Kristan, R. Pﬂugfelder, A. Leonardis, J. Matas, F. Porikli, and et al. The Visual Object Tracking VOT2013 Challenge Results. In CVPR Workshops, 2013

work page 2013
[30]

Liu, X.-Y

Y . Liu, X.-Y . Jing, J. Nie, H. Gao, J. Liu, and G.-P. Jiang. Context-aware 3-D Mean-shift with Occlusion Handling for Robust Object Tracking in RGB-D Videos. IEEE TMM , 2018

work page 2018
[31]

Luke ˇziˇc, L

A. Luke ˇziˇc, L. ˇCehovin Zajc, T. V ojiˇr, J. Matas, and M. Kris- tan. FuCoLoT - A Fully-Correlational Long-Term Tracker. In ACCV, 2018

work page 2018
[32]

Luke ˇziˇc, T

A. Luke ˇziˇc, T. V oj´ır, L. ˇCehovin, J. Matas, and M. Kristan. Discriminative Correlation Filter with Channel and Spatial Reliability. In CVPR, 2017

work page 2017
[33]

Now you see me: evaluating performance in long-term visual tracking

A. Lukezic, L. C. Zajc, T. V oj ´ır, J. Matas, and M. Kristan. Now you see me: evaluating performance in long-term visual tracking. CoRR, abs/1804.07056, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Meshgi, S

K. Meshgi, S. ichi Maeda, S. Oba, H. Skibbe, Y . zhe Li, and S. Ishii. An Occlusion-aware Particle Filter Tracker to Handle Complex and Persistent Occlusions. CVIU, 150:81 – 94, 2016

work page 2016
[35]

Moudgil and V

A. Moudgil and V . Gandhi. Long-Term Visual Object Track- ing Benchmark. In ACCV, 2018

work page 2018
[36]

Mueller, N

M. Mueller, N. Smith, and B. Ghanem. A Benchmark and Simulator for UA V Tracking. InECCV, 2016

work page 2016
[37]

Muller, A

M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. InECCV, 2018

work page 2018
[38]

Nam and B

H. Nam and B. Han. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In CVPR, 2016

work page 2016
[39]

Richter, V

S. Richter, V . Vineet, S. Roth, and V . Koltun. Playing for Data: Ground Truth from Computer Games. InECCV, 2016

work page 2016
[40]

A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah. Visual Tracking: An Experimen- tal Survey. IEEE PAMI, 36(7):1442–1468, 2014

work page 2014
[41]

Song and J

S. Song and J. Xiao. Tracking Revisited Using RGBD Cam- era: Uniﬁed Benchmark and Baselines. In ICCV, 2013

work page 2013
[42]

Spinello and K

L. Spinello and K. O. Arras. People detection in RGB-D data. In IROS, 2011

work page 2011
[43]

Valmadre, L

J. Valmadre, L. Bertinetto, J. F. Henriques, R. Tao, A. Vedaldi, A. W. M. Smeulders, P. H. S. Torr, and E. Gavves. Long-term Tracking in the Wild: A Benchmark. In ECCV, 2018

work page 2018
[44]

Y . Wu, J. Lim, and Y . Ming-Hsuan. Object Tracking Bench- mark. IEEE PAMI, 37:1834 – 1848, 2015

work page 2015
[45]

J. Xiao, R. Stolkin, Y . Gao, and A. Leonardis. Robust Fu- sion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio- Temporal Consistency Constraints. IEEE Transactions on Cybernetics, 48:2485 – 2499, 2018

work page 2018
[46]

Learning regression and verification networks for long-term visual tracking

Y . Zhang, D. Wang, L. Wang, J. Qi, and H. Lu. Learning Regression and Veriﬁcation Networks for Long-term Visual Tracking. CoRR, abs/1809.04320, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

An, X.-G

N. An, X.-G. Zhao, and Z.-G. Hou. Online RGB-D Tracking via Detection-Learning-Segmentation. In ICPR, 2016

work page 2016

[2] [2]

Bertinetto, J

L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. Fully-Convolutional Siamese Networks for Ob- ject Tracking. In ECCV Workshops, 2016

work page 2016

[3] [3]

A. Bibi, T. Zhang, and B. Ghanem. 3D Part-Based Sparse Tracker with Automatic Synchronization and Registration. In CVPR, 2016

work page 2016

[4] [4]

D. S. Bolme, J. Beveridge, B. A. Draper, and Y .-M. Lui. Vi- sual Object Tracking using Adaptive Correlation Filters. In CVPR, 2010

work page 2010

[5] [5]

A. Buch, D. Kraft, J.-K. Kamarainen, H. Petersen, and N. Kruger. Pose estimation using local structure-speciﬁc shape and appearance context. In ICRA, 2013

work page 2013

[6] [6]

Camplani, S

M. Camplani, S. Hannuna, M. Mirmehdi, D. Damen, A. Paiement, L. Tao, and T. Burghardt. Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. In BMVC, 2015

work page 2015

[7] [7]

Choi and H

C. Choi and H. Christensen. RGB-d object tracking: A par- ticle ﬁlter approach on GPU. In IROS, 2013

work page 2013

[8] [8]

W. Choi, C. Pantofaru, and S. Savarese. A General Frame- work for Tracking Multiple People from a Moving Camera. IEEE PAMI, 2013

work page 2013

[9] [9]

Dalal and B

N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005

work page 2005

[10] [10]

Danelljan, G

M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Fels- berg. ECO: Efﬁcient Convolution Operators for Tracking. In CVPR, 2017

work page 2017

[11] [11]

A. Ess, B. Leibe, K. Schindler, , and L. van Gool. A Mobile Vision System for Robust Multi-Person Tracking. In CVPR, 2008

work page 2008

[12] [12]

Galoogahi, T

H. Galoogahi, T. Sim, and S. Lucey. Correlation Filters with Limited Boundaries. In CVPR, 2015

work page 2015

[13] [13]

Garcia-Hernando, S

G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim. First- Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. In CVPR, 2018

work page 2018

[14] [14]

Hannuna, M

S. Hannuna, M. Camplani, J. Hall, M. Mirmehdi, D. Damen, T. Burghardt, A. Paiement, and L. Tao. DS-KCF: A Real- time Tracker for RGB-D Data. Journal of Real-Time Image Processing, 2016

work page 2016

[15] [15]

R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Second edition, 2004

work page 2004

[16] [16]

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. High- Speed Tracking with Kernelized Correlation Filters. IEEE PAMI, 37(3):583–596, 2015

work page 2015

[17] [17]

Hirschmuller

H. Hirschmuller. Accurate and Efﬁcient Stereo Process- ing by Semi-Global Matching and Mutual Information. In CVPR, 2005

work page 2005

[18] [18]

Kalal, K

Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-Learning- Detection. IEEE PAMI, 34(7):1409–1422, 2011

work page 2011

[19] [19]

Kart, J.-K

U. Kart, J.-K. K ¨am¨ar¨ainen, and J. Matas. How to Make an RGBD Tracker ? In ECCV Workshops, 2018

work page 2018

[20] [20]

Kart, J.-K

U. Kart, J.-K. K ¨am¨ar¨ainen, J. Matas, L. Fan, and F. Cricri. Depth Masked Discriminative Correlation Filter. In ICPR, 2018

work page 2018

[21] [21]

U. Kart, A. Luke ˇziˇc, M. Kristan, J.-K. K ¨am¨ar¨ainen, and J. Matas. Object Tracking by Reconstruction with View- Speciﬁc Discriminative Correlation Filters. InCVPR, 2019

work page 2019

[22] [22]

Kiani Galoogahi, A

H. Kiani Galoogahi, A. Fagg, and S. Lucey. Learning Background-Aware Correlation Filters for Visual Tracking. In ICCV, 2017

work page 2017

[23] [23]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pﬂugfelder, L. ˇCehovin, T. V oj´ır, and et al. The Visual Ob- ject Tracking VOT2016 Challenge Results. In ECCV Work- shops, 2016

work page 2016

[24] [24]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pﬂugfelder, and et al. The Visual Object Tracking VOT2017 Challenge Results. In ICCV Workshops, 2017

work page 2017

[25] [25]

Kristan, A

M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pfugfelder, L. C. Zajc, and T. V . et al. The sixth Visual Object Tracking VOT2018 challenge results. InECCV Work- shops, 2018

work page 2018

[26] [26]

Kristan, J

M. Kristan, J. Matas, A. Leonardis, M. Felsberg, and L. e. a. ˇCehovin Zajc. The Visual Object Tracking VOT2015 Chal- lenge Results. In ICCV Workshops, 2015

work page 2015

[27] [27]

Kristan, J

M. Kristan, J. Matas, G. Nebehay, F. Porikli, and L. ˇCehovin. A Novel Performance Evaluation Methodology for Single- Target Trackers. IEEE PAMI, 38(11):2137–2155, 2016

work page 2016

[28] [28]

Kristan, R

M. Kristan, R. Pﬂugfelder, A. Leonardis, J. Matas, L. ˇCehovin, G. Nebehay, T. V oj´ır, and et al. The Visual Ob- ject Tracking VOT2014 Challenge Results. In ECCV Work- shops, 2014

work page 2014

[29] [29]

Kristan, R

M. Kristan, R. Pﬂugfelder, A. Leonardis, J. Matas, F. Porikli, and et al. The Visual Object Tracking VOT2013 Challenge Results. In CVPR Workshops, 2013

work page 2013

[30] [30]

Liu, X.-Y

Y . Liu, X.-Y . Jing, J. Nie, H. Gao, J. Liu, and G.-P. Jiang. Context-aware 3-D Mean-shift with Occlusion Handling for Robust Object Tracking in RGB-D Videos. IEEE TMM , 2018

work page 2018

[31] [31]

Luke ˇziˇc, L

A. Luke ˇziˇc, L. ˇCehovin Zajc, T. V ojiˇr, J. Matas, and M. Kris- tan. FuCoLoT - A Fully-Correlational Long-Term Tracker. In ACCV, 2018

work page 2018

[32] [32]

Luke ˇziˇc, T

A. Luke ˇziˇc, T. V oj´ır, L. ˇCehovin, J. Matas, and M. Kristan. Discriminative Correlation Filter with Channel and Spatial Reliability. In CVPR, 2017

work page 2017

[33] [33]

Now you see me: evaluating performance in long-term visual tracking

A. Lukezic, L. C. Zajc, T. V oj ´ır, J. Matas, and M. Kristan. Now you see me: evaluating performance in long-term visual tracking. CoRR, abs/1804.07056, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

Meshgi, S

K. Meshgi, S. ichi Maeda, S. Oba, H. Skibbe, Y . zhe Li, and S. Ishii. An Occlusion-aware Particle Filter Tracker to Handle Complex and Persistent Occlusions. CVIU, 150:81 – 94, 2016

work page 2016

[35] [35]

Moudgil and V

A. Moudgil and V . Gandhi. Long-Term Visual Object Track- ing Benchmark. In ACCV, 2018

work page 2018

[36] [36]

Mueller, N

M. Mueller, N. Smith, and B. Ghanem. A Benchmark and Simulator for UA V Tracking. InECCV, 2016

work page 2016

[37] [37]

Muller, A

M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. InECCV, 2018

work page 2018

[38] [38]

Nam and B

H. Nam and B. Han. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In CVPR, 2016

work page 2016

[39] [39]

Richter, V

S. Richter, V . Vineet, S. Roth, and V . Koltun. Playing for Data: Ground Truth from Computer Games. InECCV, 2016

work page 2016

[40] [40]

A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah. Visual Tracking: An Experimen- tal Survey. IEEE PAMI, 36(7):1442–1468, 2014

work page 2014

[41] [41]

Song and J

S. Song and J. Xiao. Tracking Revisited Using RGBD Cam- era: Uniﬁed Benchmark and Baselines. In ICCV, 2013

work page 2013

[42] [42]

Spinello and K

L. Spinello and K. O. Arras. People detection in RGB-D data. In IROS, 2011

work page 2011

[43] [43]

Valmadre, L

J. Valmadre, L. Bertinetto, J. F. Henriques, R. Tao, A. Vedaldi, A. W. M. Smeulders, P. H. S. Torr, and E. Gavves. Long-term Tracking in the Wild: A Benchmark. In ECCV, 2018

work page 2018

[44] [44]

Y . Wu, J. Lim, and Y . Ming-Hsuan. Object Tracking Bench- mark. IEEE PAMI, 37:1834 – 1848, 2015

work page 2015

[45] [45]

J. Xiao, R. Stolkin, Y . Gao, and A. Leonardis. Robust Fu- sion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio- Temporal Consistency Constraints. IEEE Transactions on Cybernetics, 48:2485 – 2499, 2018

work page 2018

[46] [46]

Learning regression and verification networks for long-term visual tracking

Y . Zhang, D. Wang, L. Wang, J. Qi, and H. Lu. Learning Regression and Veriﬁcation Networks for Long-term Visual Tracking. CoRR, abs/1809.04320, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018