DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

Aniket Bera; Dinesh Manocha; Rohan Chandra; Uttaran Bhattacharya

arxiv: 1906.10313 · v3 · pith:MWTXAA6Onew · submitted 2019-06-25 · 💻 cs.RO

DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

Rohan Chandra , Uttaran Bhattacharya , Aniket Bera , Dinesh Manocha This is my paper

Pith reviewed 2026-05-25 16:56 UTC · model grok-4.3

classification 💻 cs.RO

keywords pedestrian trackingdense crowdsFront-RVOmotion modelsparse featurescollision avoidanceMask R-CNN

0 comments

The pith

DensePeds tracks pedestrians in dense crowds using Front-RVO motion prediction and sparse features from Mask R-CNN.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DensePeds as a method for tracking individuals in crowds denser than two pedestrians per square meter, using front or elevated camera views. It introduces Front-RVO, a motion model based on collision avoidance, and pairs it with sparse feature vectors from Mask R-CNN to limit track losses. On standard MOT benchmarks the approach runs 4.5 times faster than earlier algorithms, while on a new dense crowd dataset it raises average accuracy by more than 2.6 percent. A reader would care if reliable single-person tracking becomes feasible in high-density settings where most prior systems fragment. The central premise is that a tailored motion model plus efficient features can preserve continuity without heavy computation.

Core claim

DensePeds uses Front-RVO, a motion model that incorporates collision avoidance constraints for predicting pedestrian movements from front-facing cameras, together with sparse feature vectors computed by Mask R-CNN, to reduce false negatives. This combination yields 4.5 times the speed of prior trackers on MOT benchmarks and improves state-of-the-art accuracy by over 2.6 percent on average in dense crowd videos.

What carries the argument

Front-RVO, a motion model encoding collision avoidance constraints for front-view pedestrian prediction, integrated with sparse feature vectors from Mask R-CNN to sustain track continuity.

If this is right

Individual pedestrian tracks remain stable longer when density exceeds two people per square meter.
Processing runs fast enough for real-time operation on standard hardware.
Fewer track losses support downstream uses such as crowd counting or behavior analysis.
The method combines readily with existing detection networks without requiring new end-to-end training.
Gains hold across both standard MOT sequences and specialized dense-crowd test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The collision-avoidance formulation in Front-RVO could be adapted for side or overhead views if perspective effects are modeled.
Sparse features might reduce compute in other real-time vision pipelines that currently rely on dense descriptors.
Creation of the new dense dataset highlights the value of targeted benchmarks for extreme conditions.
If the speed advantage persists, hybrid motion-plus-feature designs may compete with heavier learned models in resource-constrained settings.

Load-bearing premise

The new dense crowd dataset and the evaluation protocol used to measure the accuracy gain accurately reflect real-world performance without post-hoc selection of test sequences or metrics.

What would settle it

An independent test on a separate collection of dense crowd videos that shows no accuracy or speed advantage over existing methods would disprove the performance claims.

Figures

Figures reproduced from arXiv: 1906.10313 by Aniket Bera, Dinesh Manocha, Rohan Chandra, Uttaran Bhattacharya.

**Figure 3.** Figure 3: (left) A circular representation results in many false δoverlaps.(right) FRVO efficiently models pedestrians using elliptical representations that on average cause δ → 0. Sequence: IITF-1. ∆ is upper-bounded by a function of δ. Observe that the error bound increases for higher δ. This is interpreted as follows: the more we increase the δ-overlap, the transparent gray cone will correspondingly shrink, and… view at source ↗

**Figure 2.** Figure 2: (Left) Standard VO configuration fundamental to RVO, using [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Overview of our real-time pedestrian tracking algorithm, DensePeds. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative analysis of DensePeds on the NPLACE-2 sequence consisting of 144 pedestrians in the video. Frames are chosen with a gap of 4 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (greater than 2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4.5 times faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Front-RVO is a reasonable incremental motion model for dense tracking, but the 2.6% SOTA claim rests on an undescribed new dataset whose independence from the method is unclear.

read the letter

The main things to know are that the paper introduces Front-RVO, a collision-avoidance motion model tailored to front or elevated views in crowds denser than 2 people per square meter, and combines it with sparse Mask R-CNN features to reduce track loss. It reports 4.5 times faster runtime than prior trackers on the MOT benchmark and a 2.6% absolute gain on a new dense-crowd dataset they created. The speed result on an established benchmark is the more solid part; the motion model itself is a direct adaptation of existing reciprocal velocity obstacle ideas to the front-camera case, which is a modest but useful specialization. The integration with detection features is straightforward and addresses a real failure mode in dense scenes. The soft spot is the state-of-the-art claim on dense videos. The abstract supplies no information on dataset collection, sequence selection criteria, density distribution, or train/test split, so it is impossible to tell whether the reported gain reflects genuine improvement or favorable test conditions. Without those details the central performance number cannot be taken at face value. This paper is for people working on practical multi-object tracking for robotics or surveillance who already know the MOT literature and want a ready-to-try motion model for high-density cases. It is not foundational, but the applied focus and the speed number on MOT make it worth a referee's time to verify the dataset and baselines.

Referee Report

2 major / 1 minor

Summary. The paper presents DensePeds, a pedestrian tracking algorithm for dense crowds (>2 pedestrians per m²) captured from front-facing or elevated cameras. It introduces the Front-RVO motion model incorporating collision avoidance and combines it with Mask R-CNN to generate sparse feature vectors that reduce track loss. The method is evaluated on standard MOT benchmarks and a newly introduced dense crowd dataset, claiming 4.5× faster runtime than prior trackers on MOT and state-of-the-art performance on dense videos by >2.6% on average.

Significance. If the performance claims hold under independent evaluation, the work could improve tracking reliability in high-density scenarios relevant to surveillance and autonomous systems. The introduction of a dedicated dense-crowd dataset is a potential contribution, though its value depends on transparent collection and split protocols. The reported speed-up on an established benchmark is a concrete, verifiable strength.

major comments (2)

[Evaluation on new dense crowd dataset (abstract and results section)] The central SOTA claim of >2.6% absolute improvement on dense videos rests entirely on the newly introduced dataset. No details are provided on dataset collection procedure, sequence selection criteria, density statistics across sequences, or the train/test split protocol, making it impossible to determine whether the reported gain is independent of method-specific tuning or post-hoc sequence choice.
[Method and results] The abstract states that Front-RVO is combined with Mask R-CNN sparse features to reduce false negatives, yet no ablation is referenced that isolates the contribution of each component on the dense dataset; without such controls the attribution of the 2.6% gain remains unclear.

minor comments (1)

[Abstract] The abstract claims 'state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average' without naming the competing methods or the exact metric (MOTA, IDF1, etc.) used for the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested details and analyses.

read point-by-point responses

Referee: [Evaluation on new dense crowd dataset (abstract and results section)] The central SOTA claim of >2.6% absolute improvement on dense videos rests entirely on the newly introduced dataset. No details are provided on dataset collection procedure, sequence selection criteria, density statistics across sequences, or the train/test split protocol, making it impossible to determine whether the reported gain is independent of method-specific tuning or post-hoc sequence choice.

Authors: We agree that the current manuscript lacks sufficient documentation on the new dense crowd dataset. In the revision we will add a dedicated subsection that describes the collection procedure, sequence selection criteria, per-sequence density statistics, and the train/test split protocol. This addition will allow readers to assess the independence of the reported gains. revision: yes
Referee: [Method and results] The abstract states that Front-RVO is combined with Mask R-CNN sparse features to reduce false negatives, yet no ablation is referenced that isolates the contribution of each component on the dense dataset; without such controls the attribution of the 2.6% gain remains unclear.

Authors: We acknowledge that the manuscript does not contain an ablation isolating Front-RVO from the Mask R-CNN sparse features on the dense dataset. We will perform the required experiments and include an ablation study in the revised results section that reports tracking metrics with and without each component. revision: yes

Circularity Check

0 steps flagged

No circularity in performance claims or derivation

full rationale

The paper describes an algorithmic pipeline (Front-RVO motion model combined with Mask R-CNN sparse features) whose outputs are measured via standard tracking metrics on the established MOT benchmark and a newly introduced dense-crowd dataset. No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-citations, or renamed inputs; the reported speed-up (4.5×) and accuracy gain (2.6 %) are direct empirical measurements rather than tautological re-statements of the method itself. The evaluation protocol therefore remains independent of the claimed results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described beyond the introduction of the Front-RVO model itself.

pith-pipeline@v0.9.0 · 5676 in / 1156 out tokens · 17093 ms · 2026-05-25T16:56:57.668959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 3 internal anchors

[1]

Aggressive, tense or shy? identifying personality traits from crowd videos,

A. Bera, T. Randhavane, and D. Manocha, “Aggressive, tense or shy? identifying personality traits from crowd videos,” in IJCAI, 2017

work page 2017
[2]

Traphic: Tra- jectory prediction in dense and heterogeneous trafﬁc using weighted interactions,

R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic: Tra- jectory prediction in dense and heterogeneous trafﬁc using weighted interactions,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019
[3]

Practical object recognition in au- tonomous driving and beyond,

A. Teichman and S. Thrun, “Practical object recognition in au- tonomous driving and beyond,” in Advanced Robotics and its Social Impacts, pp. 35–38, IEEE, 2011

work page 2011
[4]

Structure preserving object track- ing,

L. Zhang and L. van der Maaten, “Structure preserving object track- ing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1838–1845, 2013

work page 2013
[5]

Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model,

W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang, “Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 34, no. 12, pp. 2420–2440, 2012

work page 2012
[6]

Reciprocal n-body collision avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research, pp. 3–19, Springer, 2011

work page 2011
[7]

Mask R-CNN,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask R-CNN,” ArXiv e-prints, Mar. 2017

work page 2017
[8]

MOT16: A Benchmark for Multi-Object Tracking,

A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” ArXiv e-prints, Mar. 2016

work page 2016
[9]

Histograms of oriented gradients for human detection,

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR

work page 2005
[10]

IEEE Computer Society Conference on , vol. 1, pp. 886–893, IEEE, 2005

work page 2005
[11]

Distinctive image features from scale-invariant key- points,

D. G. Lowe, “Distinctive image features from scale-invariant key- points,” International journal of computer vision , vol. 60, no. 2, pp. 91–110, 2004

work page 2004
[12]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” ArXiv e-prints, Nov. 2013

work page 2013
[13]

Fast r-cnn,

R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision , pp. 1440–1448, 2015

work page 2015
[14]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” ArXiv e-prints, June 2015

work page 2015
[15]

Pose2seg: Human instance segmentation without detection,

R. Li, X. Dong, Z. Cai, D. Yang, H. Huang, S.-H. Zhang, P. Rosin, and S.-M. Hu, “Pose2seg: Human instance segmentation without detection,” arXiv preprint arXiv:1803.10683 , 2018

work page arXiv 2018
[16]

Online multi-target tracking using recurrent neural networks,

A. Milan, S. H. Rezatoﬁghi, A. Dick, I. Reid, and K. Schindler, “Online multi-target tracking using recurrent neural networks,” in Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017

work page 2017
[17]

Conﬁdence-based data association and discriminative deep appearance learning for robust online multi- object tracking,

S.-H. Bae and K.-J. Yoon, “Conﬁdence-based data association and discriminative deep appearance learning for robust online multi- object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 595–610, 2018

work page 2018
[18]

Online multi- object tracking using cnn-based single object tracker with spatial- temporal attention mechanism,

Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Online multi- object tracking using cnn-based single object tracker with spatial- temporal attention mechanism,” 2017

work page 2017
[19]

Recurrent Autoregressive Networks for Online Multi-Object Tracking

K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregres- sive networks for online multi-object tracking,” arXiv preprint arXiv:1711.02741, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

Simple Online and Realtime Tracking with a Deep Association Metric,

N. Wojke, A. Bewley, and D. Paulus, “Simple Online and Realtime Tracking with a Deep Association Metric,” ArXiv e-prints, Mar. 2017

work page 2017
[21]

Real-Time Multiple Object Tracking - A Study on the Importance of Speed

S. Murray, “Real-time multiple object tracking-a study on the impor- tance of speed,” arXiv preprint arXiv:1709.03572 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Continuous energy minimization for multitarget tracking,

A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization for multitarget tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 58–72, 2013

work page 2013
[23]

Multiple hypothesis track- ing revisited,

C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis track- ing revisited,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 4696–4704, 2015

work page 2015
[24]

Fusion of head and full-body detectors for multi-object tracking,

R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn, “Fusion of head and full-body detectors for multi-object tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1428–1437, 2018

work page 2018
[25]

Robust local effective matching model for multi-target tracking,

H. Sheng, L. Hao, J. Chen, Y . Zhang, and W. Ke, “Robust local effective matching model for multi-target tracking,” in Paciﬁc Rim Conference on Multimedia , pp. 233–243, Springer, 2017

work page 2017
[26]

An algorithm for tracking multiple targets,

D. Reid, “An algorithm for tracking multiple targets,” IEEE transac- tions on Automatic Control , vol. 24, no. 6, pp. 843–854, 1979

work page 1979
[27]

Realtime multilevel crowd tracking using reciprocal velocity obstacles,

A. Bera and D. Manocha, “Realtime multilevel crowd tracking using reciprocal velocity obstacles,” in Pattern Recognition (ICPR), 2014 22nd International Conference on , pp. 4164–4169, IEEE, 2014

work page 2014
[28]

Roadtrack: Tracking road agents in dense and hetero- geneous environments,

R. Chandra, U. Bhattacharya, T. Randhavane, A. Bera, and D. Manocha, “Roadtrack: Tracking road agents in dense and hetero- geneous environments,” arXiv preprint arXiv:1906.10712 , 2019

work page arXiv 1906
[29]

Adapt: real-time adaptive pedestrian tracking for crowded scenes,

A. Bera, N. Galoppo, D. Sharlet, A. Lake, and D. Manocha, “Adapt: real-time adaptive pedestrian tracking for crowded scenes,” inRobotics and Automation (ICRA), 2014 IEEE International Conference on , pp. 1801–1808, IEEE, 2014

work page 2014
[30]

You’ll never walk alone: Modeling social behavior for multi-target tracking,

S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in 2009 IEEE 12th International Conference on Computer Vision , pp. 261– 268, Sept 2009

work page 2009
[31]

Who are you with and where are you going?,

K. Yamaguchi, A. C. Berg, L. E. Ortiz, and T. L. Berg, “Who are you with and where are you going?,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on , pp. 1345– 1352, IEEE, 2011

work page 2011
[32]

Social force model for pedestrian dynam- ics,

D. Helbing and P. Molnar, “Social force model for pedestrian dynam- ics,” Physical review E , vol. 51, no. 5, p. 4282, 1995

work page 1995
[33]

A predictive collision avoidance model for pedestrian simulation,

I. Karamouzas, P. Heil, P. Van Beek, and M. H. Overmars, “A predictive collision avoidance model for pedestrian simulation,” in International Workshop on Motion in Games , pp. 41–52, Springer, 2009

work page 2009
[34]

Behavioral priors for detection and tracking of pedestrians in video sequences,

G. Antonini, S. V . Martinez, M. Bierlaire, and J. P. Thiran, “Behavioral priors for detection and tracking of pedestrians in video sequences,” International Journal of Computer Vision, vol. 69, no. 2, pp. 159–180, 2006

work page 2006
[35]

A mobile robot that understands pedestrian spatial behaviors,

S.-Y . Chung and H.-P. Huang, “A mobile robot that understands pedestrian spatial behaviors,” inIntelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on , pp. 5861–5866, IEEE, 2010

work page 2010
[36]

Real-time reciprocal collision avoidance with elliptical agents,

A. Best, S. Narang, and D. Manocha, “Real-time reciprocal collision avoidance with elliptical agents,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on , pp. 298–305, IEEE, 2016

work page 2016
[37]

Motion planning in dynamic environments using velocity obstacles,

P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” The International Journal of Robotics Re- search, vol. 17, no. 7, pp. 760–772, 1998

work page 1998
[38]

The hungarian method for the assignment problem,

H. W. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming 1958-2008 , pp. 29–47, Springer, 2010

work page 1958
[39]

Distance between sets,

M. Levandowsky and D. Winter, “Distance between sets,” Nature, vol. 234, no. 5323, p. 34, 1971

work page 1971
[40]

Real-time multiple people tracking with deeply learned candidate selection and person re-identiﬁcation,

C. Long, A. Haizhou, Z. Zijie, and S. Chong, “Real-time multiple people tracking with deeply learned candidate selection and person re-identiﬁcation,” in ICME, 2018

work page 2018
[41]

Learning to track: Online multi- object tracking by decision making,

Y . Xiang, A. Alahi, and S. Savarese, “Learning to track: Online multi- object tracking by decision making,” in Proceedings of the IEEE international conference on computer vision , pp. 4705–4713, 2015

work page 2015
[42]

Tracking the untrackable: Learning to track multiple cues with long-term dependencies,

A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,”

work page
[43]

A Hybrid Data Association Framework for Robust Online Multi-Object Tracking

M. Yang, Y . Wu, and Y . Jia, “A hybrid data association framework for robust online multi-object tracking,” arXiv preprint arXiv:1703.10764, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[44]

Online multi-object tracking with convolutional neural networks,

L. Chen, H. Ai, C. Shang, Z. Zhuang, and B. Bai, “Online multi-object tracking with convolutional neural networks,” in Image Processing (ICIP), 2017 IEEE International Conference on , pp. 645–649, IEEE, 2017

work page 2017
[45]

Online multi-target tracking with strong and weak detections,

R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro, “Online multi-target tracking with strong and weak detections,” in European Conference on Computer Vision , pp. 84–99, Springer, 2016

work page 2016
[46]

Recurrent autoregressive net- works for online multi-object tracking,

K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregressive net- works for online multi-object tracking,” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 466–475, 2018

work page 2018
[47]

Vision meets robotics: The kitti dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research , vol. 32, no. 11, pp. 1231–1237, 2013

work page 2013
[48]

Pets 2016: Dataset and challenge,

L. Patino, T. Cane, A. Vallee, and J. Ferryman, “Pets 2016: Dataset and challenge,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pp. 1–8, 2016

work page 2016
[49]

Evaluating multiple object tracking performance: the clear mot metrics,

K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” Journal on Image and Video Processing, vol. 2008, p. 1, 2008

work page 2008

[1] [1]

Aggressive, tense or shy? identifying personality traits from crowd videos,

A. Bera, T. Randhavane, and D. Manocha, “Aggressive, tense or shy? identifying personality traits from crowd videos,” in IJCAI, 2017

work page 2017

[2] [2]

Traphic: Tra- jectory prediction in dense and heterogeneous trafﬁc using weighted interactions,

R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic: Tra- jectory prediction in dense and heterogeneous trafﬁc using weighted interactions,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019

[3] [3]

Practical object recognition in au- tonomous driving and beyond,

A. Teichman and S. Thrun, “Practical object recognition in au- tonomous driving and beyond,” in Advanced Robotics and its Social Impacts, pp. 35–38, IEEE, 2011

work page 2011

[4] [4]

Structure preserving object track- ing,

L. Zhang and L. van der Maaten, “Structure preserving object track- ing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1838–1845, 2013

work page 2013

[5] [5]

Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model,

W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang, “Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 34, no. 12, pp. 2420–2440, 2012

work page 2012

[6] [6]

Reciprocal n-body collision avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research, pp. 3–19, Springer, 2011

work page 2011

[7] [7]

Mask R-CNN,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask R-CNN,” ArXiv e-prints, Mar. 2017

work page 2017

[8] [8]

MOT16: A Benchmark for Multi-Object Tracking,

A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” ArXiv e-prints, Mar. 2016

work page 2016

[9] [9]

Histograms of oriented gradients for human detection,

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR

work page 2005

[10] [10]

IEEE Computer Society Conference on , vol. 1, pp. 886–893, IEEE, 2005

work page 2005

[11] [11]

Distinctive image features from scale-invariant key- points,

D. G. Lowe, “Distinctive image features from scale-invariant key- points,” International journal of computer vision , vol. 60, no. 2, pp. 91–110, 2004

work page 2004

[12] [12]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” ArXiv e-prints, Nov. 2013

work page 2013

[13] [13]

Fast r-cnn,

R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision , pp. 1440–1448, 2015

work page 2015

[14] [14]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” ArXiv e-prints, June 2015

work page 2015

[15] [15]

Pose2seg: Human instance segmentation without detection,

R. Li, X. Dong, Z. Cai, D. Yang, H. Huang, S.-H. Zhang, P. Rosin, and S.-M. Hu, “Pose2seg: Human instance segmentation without detection,” arXiv preprint arXiv:1803.10683 , 2018

work page arXiv 2018

[16] [16]

Online multi-target tracking using recurrent neural networks,

A. Milan, S. H. Rezatoﬁghi, A. Dick, I. Reid, and K. Schindler, “Online multi-target tracking using recurrent neural networks,” in Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017

work page 2017

[17] [17]

Conﬁdence-based data association and discriminative deep appearance learning for robust online multi- object tracking,

S.-H. Bae and K.-J. Yoon, “Conﬁdence-based data association and discriminative deep appearance learning for robust online multi- object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 595–610, 2018

work page 2018

[18] [18]

Online multi- object tracking using cnn-based single object tracker with spatial- temporal attention mechanism,

Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Online multi- object tracking using cnn-based single object tracker with spatial- temporal attention mechanism,” 2017

work page 2017

[19] [19]

Recurrent Autoregressive Networks for Online Multi-Object Tracking

K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregres- sive networks for online multi-object tracking,” arXiv preprint arXiv:1711.02741, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

Simple Online and Realtime Tracking with a Deep Association Metric,

N. Wojke, A. Bewley, and D. Paulus, “Simple Online and Realtime Tracking with a Deep Association Metric,” ArXiv e-prints, Mar. 2017

work page 2017

[21] [21]

Real-Time Multiple Object Tracking - A Study on the Importance of Speed

S. Murray, “Real-time multiple object tracking-a study on the impor- tance of speed,” arXiv preprint arXiv:1709.03572 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Continuous energy minimization for multitarget tracking,

A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization for multitarget tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 58–72, 2013

work page 2013

[23] [23]

Multiple hypothesis track- ing revisited,

C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis track- ing revisited,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 4696–4704, 2015

work page 2015

[24] [24]

Fusion of head and full-body detectors for multi-object tracking,

R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn, “Fusion of head and full-body detectors for multi-object tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1428–1437, 2018

work page 2018

[25] [25]

Robust local effective matching model for multi-target tracking,

H. Sheng, L. Hao, J. Chen, Y . Zhang, and W. Ke, “Robust local effective matching model for multi-target tracking,” in Paciﬁc Rim Conference on Multimedia , pp. 233–243, Springer, 2017

work page 2017

[26] [26]

An algorithm for tracking multiple targets,

D. Reid, “An algorithm for tracking multiple targets,” IEEE transac- tions on Automatic Control , vol. 24, no. 6, pp. 843–854, 1979

work page 1979

[27] [27]

Realtime multilevel crowd tracking using reciprocal velocity obstacles,

A. Bera and D. Manocha, “Realtime multilevel crowd tracking using reciprocal velocity obstacles,” in Pattern Recognition (ICPR), 2014 22nd International Conference on , pp. 4164–4169, IEEE, 2014

work page 2014

[28] [28]

Roadtrack: Tracking road agents in dense and hetero- geneous environments,

R. Chandra, U. Bhattacharya, T. Randhavane, A. Bera, and D. Manocha, “Roadtrack: Tracking road agents in dense and hetero- geneous environments,” arXiv preprint arXiv:1906.10712 , 2019

work page arXiv 1906

[29] [29]

Adapt: real-time adaptive pedestrian tracking for crowded scenes,

A. Bera, N. Galoppo, D. Sharlet, A. Lake, and D. Manocha, “Adapt: real-time adaptive pedestrian tracking for crowded scenes,” inRobotics and Automation (ICRA), 2014 IEEE International Conference on , pp. 1801–1808, IEEE, 2014

work page 2014

[30] [30]

You’ll never walk alone: Modeling social behavior for multi-target tracking,

S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in 2009 IEEE 12th International Conference on Computer Vision , pp. 261– 268, Sept 2009

work page 2009

[31] [31]

Who are you with and where are you going?,

K. Yamaguchi, A. C. Berg, L. E. Ortiz, and T. L. Berg, “Who are you with and where are you going?,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on , pp. 1345– 1352, IEEE, 2011

work page 2011

[32] [32]

Social force model for pedestrian dynam- ics,

D. Helbing and P. Molnar, “Social force model for pedestrian dynam- ics,” Physical review E , vol. 51, no. 5, p. 4282, 1995

work page 1995

[33] [33]

A predictive collision avoidance model for pedestrian simulation,

I. Karamouzas, P. Heil, P. Van Beek, and M. H. Overmars, “A predictive collision avoidance model for pedestrian simulation,” in International Workshop on Motion in Games , pp. 41–52, Springer, 2009

work page 2009

[34] [34]

Behavioral priors for detection and tracking of pedestrians in video sequences,

G. Antonini, S. V . Martinez, M. Bierlaire, and J. P. Thiran, “Behavioral priors for detection and tracking of pedestrians in video sequences,” International Journal of Computer Vision, vol. 69, no. 2, pp. 159–180, 2006

work page 2006

[35] [35]

A mobile robot that understands pedestrian spatial behaviors,

S.-Y . Chung and H.-P. Huang, “A mobile robot that understands pedestrian spatial behaviors,” inIntelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on , pp. 5861–5866, IEEE, 2010

work page 2010

[36] [36]

Real-time reciprocal collision avoidance with elliptical agents,

A. Best, S. Narang, and D. Manocha, “Real-time reciprocal collision avoidance with elliptical agents,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on , pp. 298–305, IEEE, 2016

work page 2016

[37] [37]

Motion planning in dynamic environments using velocity obstacles,

P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” The International Journal of Robotics Re- search, vol. 17, no. 7, pp. 760–772, 1998

work page 1998

[38] [38]

The hungarian method for the assignment problem,

H. W. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming 1958-2008 , pp. 29–47, Springer, 2010

work page 1958

[39] [39]

Distance between sets,

M. Levandowsky and D. Winter, “Distance between sets,” Nature, vol. 234, no. 5323, p. 34, 1971

work page 1971

[40] [40]

Real-time multiple people tracking with deeply learned candidate selection and person re-identiﬁcation,

C. Long, A. Haizhou, Z. Zijie, and S. Chong, “Real-time multiple people tracking with deeply learned candidate selection and person re-identiﬁcation,” in ICME, 2018

work page 2018

[41] [41]

Learning to track: Online multi- object tracking by decision making,

Y . Xiang, A. Alahi, and S. Savarese, “Learning to track: Online multi- object tracking by decision making,” in Proceedings of the IEEE international conference on computer vision , pp. 4705–4713, 2015

work page 2015

[42] [42]

Tracking the untrackable: Learning to track multiple cues with long-term dependencies,

A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,”

work page

[43] [43]

A Hybrid Data Association Framework for Robust Online Multi-Object Tracking

M. Yang, Y . Wu, and Y . Jia, “A hybrid data association framework for robust online multi-object tracking,” arXiv preprint arXiv:1703.10764, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[44] [44]

Online multi-object tracking with convolutional neural networks,

L. Chen, H. Ai, C. Shang, Z. Zhuang, and B. Bai, “Online multi-object tracking with convolutional neural networks,” in Image Processing (ICIP), 2017 IEEE International Conference on , pp. 645–649, IEEE, 2017

work page 2017

[45] [45]

Online multi-target tracking with strong and weak detections,

R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro, “Online multi-target tracking with strong and weak detections,” in European Conference on Computer Vision , pp. 84–99, Springer, 2016

work page 2016

[46] [46]

Recurrent autoregressive net- works for online multi-object tracking,

K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregressive net- works for online multi-object tracking,” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 466–475, 2018

work page 2018

[47] [47]

Vision meets robotics: The kitti dataset,

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research , vol. 32, no. 11, pp. 1231–1237, 2013

work page 2013

[48] [48]

Pets 2016: Dataset and challenge,

L. Patino, T. Cane, A. Vallee, and J. Ferryman, “Pets 2016: Dataset and challenge,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pp. 1–8, 2016

work page 2016

[49] [49]

Evaluating multiple object tracking performance: the clear mot metrics,

K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” Journal on Image and Video Processing, vol. 2008, p. 1, 2008

work page 2008