Pedestrian Tracking by Probabilistic Data Association and Correspondence Embeddings

Borna Bi\'cani\'c; Ivan Markovi\'c; Ivan Petrovi\'c; Marin Or\v{s}i\'c; Sini\v{s}a \v{S}egvi\'c

arxiv: 1907.07045 · v1 · pith:Y4EVAAQAnew · submitted 2019-07-16 · 💻 cs.CV · cs.LG· cs.RO

Pedestrian Tracking by Probabilistic Data Association and Correspondence Embeddings

Borna Bi\'cani\'c , Marin Or\v{s}i\'c , Ivan Markovi\'c , Sini\v{s}a \v{S}egvi\'c , Ivan Petrovi\'c This is my paper

Pith reviewed 2026-05-24 20:49 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.RO

keywords pedestrian trackingdata associationcorrespondence embeddingsJIPDAmulti-target trackingdeep featuresmoving cameraego-motion

0 comments

The pith

In moving-camera sequences with unknown ego-motion, global nearest-neighbor tracking of deep correspondence embeddings outperforms kinematic cues for pedestrian tracking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the relative value of position-velocity kinematics versus learned appearance features when linking detections of multiple pedestrians across video frames. In fixed-camera settings a fine-tuned detector paired with joint integrated probabilistic data association driven only by kinematics ranks first on the 3DMOT2015 benchmark. When the camera itself moves and its motion is unknown, the same kinematic approach is surpassed by switching to nearest-neighbor matching on embeddings trained from ResNet-18 features with angular loss plus a margin. The work also reports that feeding the embeddings directly into the probabilistic association step itself produces little additional benefit. This distinction matters because many practical tracking tasks occur from moving platforms whose ego-motion cannot be measured reliably.

Core claim

The central claim is that, for sequences captured by a moving camera whose ego-motion is unknown, the best tracking performance is obtained by discarding kinematic cues and instead performing global nearest-neighbor matching on deep correspondence embeddings. These embeddings are produced by fine-tuning the second block of ResNet-18 with an angular loss extended by a margin term. Direct insertion of the same embeddings into the JIPDA filter did not yield significant further gains, suggesting that the geometry of the embedding space for soft data association requires additional study.

What carries the argument

Global nearest-neighbor tracking of deep correspondence embeddings trained by angular loss with margin on ResNet-18 features.

If this is right

A fine-tuned convolutional detector combined with kinematic-only JIPDA produces the top-ranked submission on the fixed-camera 3DMOT2015 benchmark.
Appearance embeddings trained with angular loss plus margin enable reliable frame-to-frame matching when ego-motion is unmodeled.
Direct use of the embeddings inside the JIPDA association step brings no clear benefit over the nearest-neighbor approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Embeddings appear to encode identity information that is more robust to unmodeled camera motion than explicit position-velocity models.
The same nearest-neighbor strategy could be tested on other moving-platform tasks such as vehicle or drone tracking.
An adaptive system that selects kinematics or embeddings according to estimated ego-motion reliability might combine the strengths of both.

Load-bearing premise

The learned embeddings stay stable enough across viewpoint changes and occlusions that nearest-neighbor distances correctly identify the same pedestrian from one frame to the next.

What would settle it

A moving-camera sequence in which nearest-neighbor matching on these embeddings produces more identity switches and track breaks than a purely kinematic tracker would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 1907.07045 by Borna Bi\'cani\'c, Ivan Markovi\'c, Ivan Petrovi\'c, Marin Or\v{s}i\'c, Sini\v{s}a \v{S}egvi\'c.

**Figure 2.** Figure 2: Distribution of scalar products of the deep embeddings [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

This paper studies the interplay between kinematics (position and velocity) and appearance cues for establishing correspondences in multi-target pedestrian tracking. We investigate tracking-by-detection approaches based on a deep learning detector, joint integrated probabilistic data association (JIPDA), and appearance-based tracking of deep correspondence embeddings. We first addressed the fixed-camera setup by fine-tuning a convolutional detector for accurate pedestrian detection and combining it with kinematic-only JIPDA. The resulting submission ranked first on the 3DMOT2015 benchmark. However, in sequences with a moving camera and unknown ego-motion, we achieved the best results by replacing kinematic cues with global nearest neighbor tracking of deep correspondence embeddings. We trained the embeddings by fine-tuning features from the second block of ResNet-18 using angular loss extended by a margin term. We note that integrating deep correspondence embeddings directly in JIPDA did not bring significant improvement. It appears that geometry of deep correspondence embeddings for soft data association needs further investigation in order to obtain the best from both worlds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Kinematics with JIPDA win on fixed-camera pedestrian tracking and get first on 3DMOT2015, while embeddings plus nearest-neighbor win on moving-camera sequences, but the two do not combine easily.

read the letter

The paper's core finding is straightforward: on fixed-camera sequences, a fine-tuned detector plus kinematic-only JIPDA ranks first on 3DMOT2015. On moving-camera sequences with unknown ego-motion, switching to global nearest-neighbor on ResNet-18 block-2 embeddings trained with angular loss plus margin works better. They are explicit that feeding the same embeddings into JIPDA produced no real gain, which undercuts any claim that the embeddings are ready for soft probabilistic association. That honesty is useful. The work is mostly an empirical comparison of cue choices rather than a new algorithm, and the component pieces (JIPDA, ResNet fine-tuning, angular loss) are established. What is new is the reported ordering and the benchmark placement. The soft spots are the usual ones for this style of paper. The abstract and stress-test note give no error bars, no run-to-run variance, and no ablation on how much the margin term or the specific block matters. The moving-camera result therefore rests on the untested assumption that the embeddings stay stable enough for reliable NN matching under viewpoint change and occlusion; the fact that JIPDA integration failed is consistent with only marginal invariance. No cross-validation on viewpoint-augmented data is described. This is the kind of paper that belongs in a tracking reading group for the concrete cue-selection lesson and the public-benchmark numbers. It is not foundational, but the empirical split and the self-reported limitation on integration are worth a referee's time to verify the numbers and check whether the embeddings really generalize as claimed. I would send it to review rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The paper investigates the combination of kinematic and appearance cues for multi-target pedestrian tracking in a tracking-by-detection framework. It reports that fine-tuning a convolutional detector and applying kinematic-only JIPDA yields first place on the 3DMOT2015 benchmark for fixed-camera sequences. For moving-camera sequences with unknown ego-motion, the authors claim best results are obtained by replacing kinematics with global nearest-neighbor association on deep correspondence embeddings extracted from the second block of a ResNet-18 fine-tuned with angular loss plus a margin term. Direct insertion of the same embeddings into JIPDA is reported to produce no significant gain, and the authors conclude that further work is needed on the geometry of embeddings for soft data association.

Significance. If the moving-camera results hold under additional validation, the work usefully demonstrates the breakdown of kinematic models under unknown ego-motion and the practical value of learned embeddings for appearance-based association when kinematics are unavailable. The top ranking on the public 3DMOT2015 benchmark is a concrete, reproducible strength that can be directly compared by other researchers. The explicit negative result on embedding integration into JIPDA is also valuable for guiding future work on hybrid association methods.

major comments (2)

[Abstract (moving-camera paragraph)] Abstract (moving-camera paragraph): The claim that global nearest-neighbor tracking of the learned correspondence embeddings outperforms kinematics in moving-camera sequences is load-bearing for the paper's central contribution, yet rests on the untested assumption that embeddings from ResNet-18 block 2 remain sufficiently invariant to viewpoint changes and occlusions. The manuscript itself notes that direct insertion of the same embeddings into JIPDA produced no significant improvement; this observation is consistent with only marginal robustness and requires an explicit ablation on viewpoint-augmented training data or cross-validation across camera-motion regimes to substantiate the generalization claim.
[Abstract and results sections] Abstract and results sections: The reported benchmark rankings that support the performance ordering between kinematic JIPDA and embedding-based NN tracking supply no error bars, statistical significance tests, or ablation tables. Without these, it is impossible to determine whether the observed ordering is robust to random seeds, detector variations, or sequence selection, weakening the evidence for the central claim that embeddings are preferable when ego-motion is unknown.

minor comments (1)

[Methods] The description of the angular loss and margin term would benefit from an explicit equation or pseudocode in the methods section to allow exact reproduction of the embedding training procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Abstract (moving-camera paragraph)] The claim that global nearest-neighbor tracking of the learned correspondence embeddings outperforms kinematics in moving-camera sequences is load-bearing for the paper's central contribution, yet rests on the untested assumption that embeddings from ResNet-18 block 2 remain sufficiently invariant to viewpoint changes and occlusions. The manuscript itself notes that direct insertion of the same embeddings into JIPDA produced no significant improvement; this observation is consistent with only marginal robustness and requires an explicit ablation on viewpoint-augmented training data or cross-validation across camera-motion regimes to substantiate the generalization claim.

Authors: The performance ordering is supported by the results on the 3DMOT2015 benchmark sequences with moving cameras, which feature real viewpoint changes, ego-motion, and occlusions. These sequences serve as a practical test of the embeddings' utility under the conditions described. We have explicitly noted the lack of improvement when integrating embeddings into JIPDA and concluded that further work on embedding geometry is required. We maintain that the benchmark results substantiate the claim for the evaluated scenarios without necessitating additional viewpoint-augmented ablations, which were not part of the original experimental design. revision: no
Referee: [Abstract and results sections] The reported benchmark rankings that support the performance ordering between kinematic JIPDA and embedding-based NN tracking supply no error bars, statistical significance tests, or ablation tables. Without these, it is impossible to determine whether the observed ordering is robust to random seeds, detector variations, or sequence selection, weakening the evidence for the central claim that embeddings are preferable when ego-motion is unknown.

Authors: We agree that the absence of error bars and statistical tests limits the assessment of robustness. The 3DMOT2015 benchmark uses a fixed set of sequences and a standardized evaluation, which is the conventional way to report and compare tracking performance. To strengthen the manuscript, we will revise the results section to include a brief discussion of these limitations and the deterministic nature of the reported rankings. This addresses the concern without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results rest on external benchmarks

full rationale

The manuscript is an empirical tracking paper that reports performance on the public 3DMOT2015 benchmark after fine-tuning a ResNet-18 detector and training correspondence embeddings with angular loss. No derivation chain, uniqueness theorem, or fitted parameter is invoked to predict another quantity that is definitionally identical to the input. All load-bearing claims (e.g., superiority of global NN on embeddings when ego-motion is unknown) are evaluated by direct comparison against held-out test sequences rather than by algebraic reduction or self-citation. The single self-citation risk noted by the reader is minor and non-load-bearing.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Abstract supplies almost no explicit free parameters or invented entities; the approach rests on standard assumptions about detector fine-tuning and embedding suitability.

free parameters (1)

margin term in angular loss
Added to angular loss for embedding training; concrete value not reported.

axioms (2)

domain assumption Fine-tuning a convolutional detector yields accurate pedestrian detections on the target domain
Invoked for the fixed-camera pipeline.
domain assumption Deep features from ResNet-18 block 2 can be turned into identity-preserving correspondence embeddings
Central premise of the appearance branch.

pith-pipeline@v0.9.0 · 5738 in / 1246 out tokens · 22692 ms · 2026-05-24T20:49:26.682104+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

[1]

ImageNet Large Scale Visual Recognition Challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei- Fei, “ImageNet Large Scale Visual Recognition Challenge,” IJCV, 2015

work page 2015
[2]

Microsoft COCO: common objects in context,

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, 2014

work page 2014
[3]

How far are we from solving pedestrian detection?

S. Zhang, R. Benenson, M. Omran, J. H. Hosang, and B. Schiele, “How far are we from solving pedestrian detection?” in CVPR, 2016

work page 2016
[4]

Citypersons: A diverse dataset for pedestrian detection,

S. Zhang, R. Benenson, and B. Schiele, “Citypersons: A diverse dataset for pedestrian detection,” in CVPR, 2017

work page 2017
[5]

Signature veriﬁcation using a siamese time delay neural network,

J. Bromley, I. Guyon, Y . LeCun, E. Säckinger, and R. Shah, “Signature veriﬁcation using a siamese time delay neural network,” in NIPS, 1993

work page 1993
[6]

Deep metric learning using triplet network,

E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in Similarity-Based Pattern Recognition - Third International Workshop, SIMBAD, 2015

work page 2015
[7]

Improved deep metric learning with multi-class n-pair loss objective,

K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” in NIPS, 2016

work page 2016
[8]

Deep metric learning with angular loss,

J. Wang, F. Zhou, S. Wen, X. Liu, and Y . Lin, “Deep metric learning with angular loss,” in ICCV, 2017

work page 2017
[9]

Mask-guided contrastive attention model for person re-identiﬁcation,

C. Song, Y . Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identiﬁcation,” in CVPR, 2018

work page 2018
[10]

Multitarget Tracking,

B.-n. V o, M. Mallick, Y . Bar-shalom, S. Coraluppi, R. Osborne, R. Mahler, and B.-t. V o, “Multitarget Tracking,” in Wiley Encyclopedia of Electrical and Electronics Engineering , 2015

work page 2015
[11]

Tracking in a cluttered environnement with probabilistic data association,

Y . Bar-Shalom and E. Tse, “Tracking in a cluttered environnement with probabilistic data association,” Automatica, 1975

work page 1975
[12]

Sonar tracking of multiple targets using joint probabilistic data association,

T. Fortmann, Y . Bar-Shalom, and M. Scheffe, “Sonar tracking of multiple targets using joint probabilistic data association,” IEEE Journal of Oceanic Engineering , 1983

work page 1983
[13]

Joint Integrated Probabilistic Data Associa- tion - JIPDA,

D. Musicki and R. Evans, “Joint Integrated Probabilistic Data Associa- tion - JIPDA,” in Proceedings of the Fifth International Conference on Information Fusion (FUSION) , 2002

work page 2002
[14]

Integrated probabilistic data association,

D. Mušicki, R. Evans, and S. Stankovic, “Integrated probabilistic data association,” Transaction on Automatic Control , 1994

work page 1994
[15]

An algorithm for tracking multiple targets,

D. Reid, “An algorithm for tracking multiple targets,” IEEE Transactions on Automatic Control , 1979

work page 1979
[16]

Multiple hypothesis tracking for multiple target tracking,

S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEE Aerospace and Electronic Systems Magazine , 2004

work page 2004
[17]

I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen, Mathematics of Data Fusion , Dordrecht, 1997

work page 1997
[18]

The Gaussian Mixture Probability Hypothesis Density Filter,

B.-N. V o and W.-K. Ma, “The Gaussian Mixture Probability Hypothesis Density Filter,” IEEE Transactions on Signal Processing , 2006

work page 2006
[19]

R. P. Mahler, Statistical Multisource-Multitarget Information Fusion , 2007

work page 2007
[20]

The labeled multi- Bernoulli ﬁlter,

S. Reuter, B. T. V o, B. N. V o, and K. Dietmayer, “The labeled multi- Bernoulli ﬁlter,” IEEE Transactions on Signal Processing , 2014

work page 2014
[21]

The Social Force PHD Filter for Tracking Pedestrians,

K. Krishanth, X. Chen, R. Tharmarasa, T. Kirubarajan, and M. Mc- Donald, “The Social Force PHD Filter for Tracking Pedestrians,” IEEE Transactions on Aerospace and Electronic Systems , 2017

work page 2017
[22]

Deep Person Re-identification for Probabilistic Data Association in Multiple Pedestrian Tracking

B. H. Wang, Y . Wang, K. Q. Weinberger, and M. Campbell, “Deep Person Re-identiﬁcation for Probabilistic Data Association in Multiple Pedestrian Tracking,” in arXiv:1810.08565, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Resource Aware Person Re- identiﬁcation across Multiple Resolutions,

Y . Wang, L. Wang, Y . You, X. Zou, V . Chen, S. Li, G. Huang, B. Hariharan, and K. Q. Weinberger, “Resource Aware Person Re- identiﬁcation across Multiple Resolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2018

work page 2018
[24]

Probabilistic multi-person tracking using dynamic bayes networks,

T. Klinger, F. Rottensteiner, and C. Heipke, “Probabilistic multi-person tracking using dynamic bayes networks,” in ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , 2015

work page 2015
[25]

Moana: An online learned adaptive appearance model for robust multiple object tracking in 3d,

Z. Tang and J. Hwang, “Moana: An online learned adaptive appearance model for robust multiple object tracking in 3d,” IEEE Access , 2019

work page 2019
[26]

MOTChal- lenge 2015: Towards a Benchmark for Multi-Target Tracking,

L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChal- lenge 2015: Towards a Benchmark for Multi-Target Tracking,” 2015

work page 2015
[27]

Mask R-CNN,

K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in ICCV, 2017

work page 2017
[28]

Faster R-CNN: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- time object detection with region proposal networks,” in NIPS, 2015

work page 2015
[29]

The cityscapes dataset,

M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” in CVPRW, 2015

work page 2015
[30]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016

work page 2016
[31]

MOT16: A Benchmark for Multi-Object Tracking,

A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” 2016

work page 2016
[32]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, 2014

work page 2014
[33]

Joint Probabilistic Data Association Revisited,

S. H. Rezatoﬁghi, A. Milan, Z. Zhang, Q. Shi, A. Dick, and I. Reid, “Joint Probabilistic Data Association Revisited,” in 2015 IEEE Interna- tional Conference on Computer Vision (ICCV) , 2015

work page 2015
[34]

Probabilistic multi-person localisation and tracking in image sequences,

T. Klinger, F. Rottensteiner, and C. Heipke, “Probabilistic multi-person localisation and tracking in image sequences,” ISPRS Journal of Pho- togrammetry and Remote Sensing , 2017

work page 2017
[35]

IMMJPDA versus MHT and Kalman ﬁlter with NN correlation: performance comparison,

M. de Feo, A. Graziano, R. Miglioli, and A. Farina, “IMMJPDA versus MHT and Kalman ﬁlter with NN correlation: performance comparison,” IEE Proceedings - Radar , Sonar and Navigation , vol. 144, no. 2, 1997

work page 1997
[36]

Multitarget sensor reso- lution model and joint probabilistic data association,

D. Svensson, M. Ulmke, and L. Hammarstrand, “Multitarget sensor reso- lution model and joint probabilistic data association,” IEEE Transactions on Aerospace and Electronic Systems , vol. 48, no. 4, 2012

work page 2012

[1] [1]

ImageNet Large Scale Visual Recognition Challenge,

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei- Fei, “ImageNet Large Scale Visual Recognition Challenge,” IJCV, 2015

work page 2015

[2] [2]

Microsoft COCO: common objects in context,

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, 2014

work page 2014

[3] [3]

How far are we from solving pedestrian detection?

S. Zhang, R. Benenson, M. Omran, J. H. Hosang, and B. Schiele, “How far are we from solving pedestrian detection?” in CVPR, 2016

work page 2016

[4] [4]

Citypersons: A diverse dataset for pedestrian detection,

S. Zhang, R. Benenson, and B. Schiele, “Citypersons: A diverse dataset for pedestrian detection,” in CVPR, 2017

work page 2017

[5] [5]

Signature veriﬁcation using a siamese time delay neural network,

J. Bromley, I. Guyon, Y . LeCun, E. Säckinger, and R. Shah, “Signature veriﬁcation using a siamese time delay neural network,” in NIPS, 1993

work page 1993

[6] [6]

Deep metric learning using triplet network,

E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in Similarity-Based Pattern Recognition - Third International Workshop, SIMBAD, 2015

work page 2015

[7] [7]

Improved deep metric learning with multi-class n-pair loss objective,

K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” in NIPS, 2016

work page 2016

[8] [8]

Deep metric learning with angular loss,

J. Wang, F. Zhou, S. Wen, X. Liu, and Y . Lin, “Deep metric learning with angular loss,” in ICCV, 2017

work page 2017

[9] [9]

Mask-guided contrastive attention model for person re-identiﬁcation,

C. Song, Y . Huang, W. Ouyang, and L. Wang, “Mask-guided contrastive attention model for person re-identiﬁcation,” in CVPR, 2018

work page 2018

[10] [10]

Multitarget Tracking,

B.-n. V o, M. Mallick, Y . Bar-shalom, S. Coraluppi, R. Osborne, R. Mahler, and B.-t. V o, “Multitarget Tracking,” in Wiley Encyclopedia of Electrical and Electronics Engineering , 2015

work page 2015

[11] [11]

Tracking in a cluttered environnement with probabilistic data association,

Y . Bar-Shalom and E. Tse, “Tracking in a cluttered environnement with probabilistic data association,” Automatica, 1975

work page 1975

[12] [12]

Sonar tracking of multiple targets using joint probabilistic data association,

T. Fortmann, Y . Bar-Shalom, and M. Scheffe, “Sonar tracking of multiple targets using joint probabilistic data association,” IEEE Journal of Oceanic Engineering , 1983

work page 1983

[13] [13]

Joint Integrated Probabilistic Data Associa- tion - JIPDA,

D. Musicki and R. Evans, “Joint Integrated Probabilistic Data Associa- tion - JIPDA,” in Proceedings of the Fifth International Conference on Information Fusion (FUSION) , 2002

work page 2002

[14] [14]

Integrated probabilistic data association,

D. Mušicki, R. Evans, and S. Stankovic, “Integrated probabilistic data association,” Transaction on Automatic Control , 1994

work page 1994

[15] [15]

An algorithm for tracking multiple targets,

D. Reid, “An algorithm for tracking multiple targets,” IEEE Transactions on Automatic Control , 1979

work page 1979

[16] [16]

Multiple hypothesis tracking for multiple target tracking,

S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEE Aerospace and Electronic Systems Magazine , 2004

work page 2004

[17] [17]

I. R. Goodman, R. P. S. Mahler, and H. T. Nguyen, Mathematics of Data Fusion , Dordrecht, 1997

work page 1997

[18] [18]

The Gaussian Mixture Probability Hypothesis Density Filter,

B.-N. V o and W.-K. Ma, “The Gaussian Mixture Probability Hypothesis Density Filter,” IEEE Transactions on Signal Processing , 2006

work page 2006

[19] [19]

R. P. Mahler, Statistical Multisource-Multitarget Information Fusion , 2007

work page 2007

[20] [20]

The labeled multi- Bernoulli ﬁlter,

S. Reuter, B. T. V o, B. N. V o, and K. Dietmayer, “The labeled multi- Bernoulli ﬁlter,” IEEE Transactions on Signal Processing , 2014

work page 2014

[21] [21]

The Social Force PHD Filter for Tracking Pedestrians,

K. Krishanth, X. Chen, R. Tharmarasa, T. Kirubarajan, and M. Mc- Donald, “The Social Force PHD Filter for Tracking Pedestrians,” IEEE Transactions on Aerospace and Electronic Systems , 2017

work page 2017

[22] [22]

Deep Person Re-identification for Probabilistic Data Association in Multiple Pedestrian Tracking

B. H. Wang, Y . Wang, K. Q. Weinberger, and M. Campbell, “Deep Person Re-identiﬁcation for Probabilistic Data Association in Multiple Pedestrian Tracking,” in arXiv:1810.08565, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Resource Aware Person Re- identiﬁcation across Multiple Resolutions,

Y . Wang, L. Wang, Y . You, X. Zou, V . Chen, S. Li, G. Huang, B. Hariharan, and K. Q. Weinberger, “Resource Aware Person Re- identiﬁcation across Multiple Resolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2018

work page 2018

[24] [24]

Probabilistic multi-person tracking using dynamic bayes networks,

T. Klinger, F. Rottensteiner, and C. Heipke, “Probabilistic multi-person tracking using dynamic bayes networks,” in ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , 2015

work page 2015

[25] [25]

Moana: An online learned adaptive appearance model for robust multiple object tracking in 3d,

Z. Tang and J. Hwang, “Moana: An online learned adaptive appearance model for robust multiple object tracking in 3d,” IEEE Access , 2019

work page 2019

[26] [26]

MOTChal- lenge 2015: Towards a Benchmark for Multi-Target Tracking,

L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChal- lenge 2015: Towards a Benchmark for Multi-Target Tracking,” 2015

work page 2015

[27] [27]

Mask R-CNN,

K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in ICCV, 2017

work page 2017

[28] [28]

Faster R-CNN: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- time object detection with region proposal networks,” in NIPS, 2015

work page 2015

[29] [29]

The cityscapes dataset,

M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset,” in CVPRW, 2015

work page 2015

[30] [30]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016

work page 2016

[31] [31]

MOT16: A Benchmark for Multi-Object Tracking,

A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” 2016

work page 2016

[32] [32]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, 2014

work page 2014

[33] [33]

Joint Probabilistic Data Association Revisited,

S. H. Rezatoﬁghi, A. Milan, Z. Zhang, Q. Shi, A. Dick, and I. Reid, “Joint Probabilistic Data Association Revisited,” in 2015 IEEE Interna- tional Conference on Computer Vision (ICCV) , 2015

work page 2015

[34] [34]

Probabilistic multi-person localisation and tracking in image sequences,

T. Klinger, F. Rottensteiner, and C. Heipke, “Probabilistic multi-person localisation and tracking in image sequences,” ISPRS Journal of Pho- togrammetry and Remote Sensing , 2017

work page 2017

[35] [35]

IMMJPDA versus MHT and Kalman ﬁlter with NN correlation: performance comparison,

M. de Feo, A. Graziano, R. Miglioli, and A. Farina, “IMMJPDA versus MHT and Kalman ﬁlter with NN correlation: performance comparison,” IEE Proceedings - Radar , Sonar and Navigation , vol. 144, no. 2, 1997

work page 1997

[36] [36]

Multitarget sensor reso- lution model and joint probabilistic data association,

D. Svensson, M. Ulmke, and L. Hammarstrand, “Multitarget sensor reso- lution model and joint probabilistic data association,” IEEE Transactions on Aerospace and Electronic Systems , vol. 48, no. 4, 2012

work page 2012