DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features
Pith reviewed 2026-05-25 16:56 UTC · model grok-4.3
The pith
DensePeds tracks pedestrians in dense crowds using Front-RVO motion prediction and sparse features from Mask R-CNN.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DensePeds uses Front-RVO, a motion model that incorporates collision avoidance constraints for predicting pedestrian movements from front-facing cameras, together with sparse feature vectors computed by Mask R-CNN, to reduce false negatives. This combination yields 4.5 times the speed of prior trackers on MOT benchmarks and improves state-of-the-art accuracy by over 2.6 percent on average in dense crowd videos.
What carries the argument
Front-RVO, a motion model encoding collision avoidance constraints for front-view pedestrian prediction, integrated with sparse feature vectors from Mask R-CNN to sustain track continuity.
If this is right
- Individual pedestrian tracks remain stable longer when density exceeds two people per square meter.
- Processing runs fast enough for real-time operation on standard hardware.
- Fewer track losses support downstream uses such as crowd counting or behavior analysis.
- The method combines readily with existing detection networks without requiring new end-to-end training.
- Gains hold across both standard MOT sequences and specialized dense-crowd test sets.
Where Pith is reading between the lines
- The collision-avoidance formulation in Front-RVO could be adapted for side or overhead views if perspective effects are modeled.
- Sparse features might reduce compute in other real-time vision pipelines that currently rely on dense descriptors.
- Creation of the new dense dataset highlights the value of targeted benchmarks for extreme conditions.
- If the speed advantage persists, hybrid motion-plus-feature designs may compete with heavier learned models in resource-constrained settings.
Load-bearing premise
The new dense crowd dataset and the evaluation protocol used to measure the accuracy gain accurately reflect real-world performance without post-hoc selection of test sequences or metrics.
What would settle it
An independent test on a separate collection of dense crowd videos that shows no accuracy or speed advantage over existing methods would disprove the performance claims.
Figures
read the original abstract
We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (greater than 2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4.5 times faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents DensePeds, a pedestrian tracking algorithm for dense crowds (>2 pedestrians per m²) captured from front-facing or elevated cameras. It introduces the Front-RVO motion model incorporating collision avoidance and combines it with Mask R-CNN to generate sparse feature vectors that reduce track loss. The method is evaluated on standard MOT benchmarks and a newly introduced dense crowd dataset, claiming 4.5× faster runtime than prior trackers on MOT and state-of-the-art performance on dense videos by >2.6% on average.
Significance. If the performance claims hold under independent evaluation, the work could improve tracking reliability in high-density scenarios relevant to surveillance and autonomous systems. The introduction of a dedicated dense-crowd dataset is a potential contribution, though its value depends on transparent collection and split protocols. The reported speed-up on an established benchmark is a concrete, verifiable strength.
major comments (2)
- [Evaluation on new dense crowd dataset (abstract and results section)] The central SOTA claim of >2.6% absolute improvement on dense videos rests entirely on the newly introduced dataset. No details are provided on dataset collection procedure, sequence selection criteria, density statistics across sequences, or the train/test split protocol, making it impossible to determine whether the reported gain is independent of method-specific tuning or post-hoc sequence choice.
- [Method and results] The abstract states that Front-RVO is combined with Mask R-CNN sparse features to reduce false negatives, yet no ablation is referenced that isolates the contribution of each component on the dense dataset; without such controls the attribution of the 2.6% gain remains unclear.
minor comments (1)
- [Abstract] The abstract claims 'state-of-the-art in dense crowd videos by over 2.6% on the absolute scale on average' without naming the competing methods or the exact metric (MOTA, IDF1, etc.) used for the comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested details and analyses.
read point-by-point responses
-
Referee: [Evaluation on new dense crowd dataset (abstract and results section)] The central SOTA claim of >2.6% absolute improvement on dense videos rests entirely on the newly introduced dataset. No details are provided on dataset collection procedure, sequence selection criteria, density statistics across sequences, or the train/test split protocol, making it impossible to determine whether the reported gain is independent of method-specific tuning or post-hoc sequence choice.
Authors: We agree that the current manuscript lacks sufficient documentation on the new dense crowd dataset. In the revision we will add a dedicated subsection that describes the collection procedure, sequence selection criteria, per-sequence density statistics, and the train/test split protocol. This addition will allow readers to assess the independence of the reported gains. revision: yes
-
Referee: [Method and results] The abstract states that Front-RVO is combined with Mask R-CNN sparse features to reduce false negatives, yet no ablation is referenced that isolates the contribution of each component on the dense dataset; without such controls the attribution of the 2.6% gain remains unclear.
Authors: We acknowledge that the manuscript does not contain an ablation isolating Front-RVO from the Mask R-CNN sparse features on the dense dataset. We will perform the required experiments and include an ablation study in the revised results section that reports tracking metrics with and without each component. revision: yes
Circularity Check
No circularity in performance claims or derivation
full rationale
The paper describes an algorithmic pipeline (Front-RVO motion model combined with Mask R-CNN sparse features) whose outputs are measured via standard tracking metrics on the established MOT benchmark and a newly introduced dense-crowd dataset. No equations, predictions, or first-principles results are presented that reduce by construction to fitted parameters, self-citations, or renamed inputs; the reported speed-up (4.5×) and accuracy gain (2.6 %) are direct empirical measurements rather than tautological re-statements of the method itself. The evaluation protocol therefore remains independent of the claimed results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aggressive, tense or shy? identifying personality traits from crowd videos,
A. Bera, T. Randhavane, and D. Manocha, “Aggressive, tense or shy? identifying personality traits from crowd videos,” in IJCAI, 2017
work page 2017
-
[2]
Traphic: Tra- jectory prediction in dense and heterogeneous traffic using weighted interactions,
R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic: Tra- jectory prediction in dense and heterogeneous traffic using weighted interactions,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
work page 2019
-
[3]
Practical object recognition in au- tonomous driving and beyond,
A. Teichman and S. Thrun, “Practical object recognition in au- tonomous driving and beyond,” in Advanced Robotics and its Social Impacts, pp. 35–38, IEEE, 2011
work page 2011
-
[4]
Structure preserving object track- ing,
L. Zhang and L. van der Maaten, “Structure preserving object track- ing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1838–1845, 2013
work page 2013
-
[5]
W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang, “Single and multiple object tracking using log-euclidean riemannian subspace and block-division appearance model,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 34, no. 12, pp. 2420–2440, 2012
work page 2012
-
[6]
Reciprocal n-body collision avoidance,
J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body collision avoidance,” in Robotics research, pp. 3–19, Springer, 2011
work page 2011
-
[7]
K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask R-CNN,” ArXiv e-prints, Mar. 2017
work page 2017
-
[8]
MOT16: A Benchmark for Multi-Object Tracking,
A. Milan, L. Leal-Taixe, I. Reid, S. Roth, and K. Schindler, “MOT16: A Benchmark for Multi-Object Tracking,” ArXiv e-prints, Mar. 2016
work page 2016
-
[9]
Histograms of oriented gradients for human detection,
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR
work page 2005
-
[10]
IEEE Computer Society Conference on , vol. 1, pp. 886–893, IEEE, 2005
work page 2005
-
[11]
Distinctive image features from scale-invariant key- points,
D. G. Lowe, “Distinctive image features from scale-invariant key- points,” International journal of computer vision , vol. 60, no. 2, pp. 91–110, 2004
work page 2004
-
[12]
Rich feature hierarchies for accurate object detection and semantic segmentation,
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” ArXiv e-prints, Nov. 2013
work page 2013
-
[13]
R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision , pp. 1440–1448, 2015
work page 2015
-
[14]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” ArXiv e-prints, June 2015
work page 2015
-
[15]
Pose2seg: Human instance segmentation without detection,
R. Li, X. Dong, Z. Cai, D. Yang, H. Huang, S.-H. Zhang, P. Rosin, and S.-M. Hu, “Pose2seg: Human instance segmentation without detection,” arXiv preprint arXiv:1803.10683 , 2018
-
[16]
Online multi-target tracking using recurrent neural networks,
A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, and K. Schindler, “Online multi-target tracking using recurrent neural networks,” in Thirty-First AAAI Conference on Artificial Intelligence , 2017
work page 2017
-
[17]
S.-H. Bae and K.-J. Yoon, “Confidence-based data association and discriminative deep appearance learning for robust online multi- object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 3, pp. 595–610, 2018
work page 2018
-
[18]
Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Online multi- object tracking using cnn-based single object tracker with spatial- temporal attention mechanism,” 2017
work page 2017
-
[19]
Recurrent Autoregressive Networks for Online Multi-Object Tracking
K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregres- sive networks for online multi-object tracking,” arXiv preprint arXiv:1711.02741, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Simple Online and Realtime Tracking with a Deep Association Metric,
N. Wojke, A. Bewley, and D. Paulus, “Simple Online and Realtime Tracking with a Deep Association Metric,” ArXiv e-prints, Mar. 2017
work page 2017
-
[21]
Real-Time Multiple Object Tracking - A Study on the Importance of Speed
S. Murray, “Real-time multiple object tracking-a study on the impor- tance of speed,” arXiv preprint arXiv:1709.03572 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
Continuous energy minimization for multitarget tracking,
A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization for multitarget tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 58–72, 2013
work page 2013
-
[23]
Multiple hypothesis track- ing revisited,
C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis track- ing revisited,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 4696–4704, 2015
work page 2015
-
[24]
Fusion of head and full-body detectors for multi-object tracking,
R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn, “Fusion of head and full-body detectors for multi-object tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1428–1437, 2018
work page 2018
-
[25]
Robust local effective matching model for multi-target tracking,
H. Sheng, L. Hao, J. Chen, Y . Zhang, and W. Ke, “Robust local effective matching model for multi-target tracking,” in Pacific Rim Conference on Multimedia , pp. 233–243, Springer, 2017
work page 2017
-
[26]
An algorithm for tracking multiple targets,
D. Reid, “An algorithm for tracking multiple targets,” IEEE transac- tions on Automatic Control , vol. 24, no. 6, pp. 843–854, 1979
work page 1979
-
[27]
Realtime multilevel crowd tracking using reciprocal velocity obstacles,
A. Bera and D. Manocha, “Realtime multilevel crowd tracking using reciprocal velocity obstacles,” in Pattern Recognition (ICPR), 2014 22nd International Conference on , pp. 4164–4169, IEEE, 2014
work page 2014
-
[28]
Roadtrack: Tracking road agents in dense and hetero- geneous environments,
R. Chandra, U. Bhattacharya, T. Randhavane, A. Bera, and D. Manocha, “Roadtrack: Tracking road agents in dense and hetero- geneous environments,” arXiv preprint arXiv:1906.10712 , 2019
-
[29]
Adapt: real-time adaptive pedestrian tracking for crowded scenes,
A. Bera, N. Galoppo, D. Sharlet, A. Lake, and D. Manocha, “Adapt: real-time adaptive pedestrian tracking for crowded scenes,” inRobotics and Automation (ICRA), 2014 IEEE International Conference on , pp. 1801–1808, IEEE, 2014
work page 2014
-
[30]
You’ll never walk alone: Modeling social behavior for multi-target tracking,
S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in 2009 IEEE 12th International Conference on Computer Vision , pp. 261– 268, Sept 2009
work page 2009
-
[31]
Who are you with and where are you going?,
K. Yamaguchi, A. C. Berg, L. E. Ortiz, and T. L. Berg, “Who are you with and where are you going?,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on , pp. 1345– 1352, IEEE, 2011
work page 2011
-
[32]
Social force model for pedestrian dynam- ics,
D. Helbing and P. Molnar, “Social force model for pedestrian dynam- ics,” Physical review E , vol. 51, no. 5, p. 4282, 1995
work page 1995
-
[33]
A predictive collision avoidance model for pedestrian simulation,
I. Karamouzas, P. Heil, P. Van Beek, and M. H. Overmars, “A predictive collision avoidance model for pedestrian simulation,” in International Workshop on Motion in Games , pp. 41–52, Springer, 2009
work page 2009
-
[34]
Behavioral priors for detection and tracking of pedestrians in video sequences,
G. Antonini, S. V . Martinez, M. Bierlaire, and J. P. Thiran, “Behavioral priors for detection and tracking of pedestrians in video sequences,” International Journal of Computer Vision, vol. 69, no. 2, pp. 159–180, 2006
work page 2006
-
[35]
A mobile robot that understands pedestrian spatial behaviors,
S.-Y . Chung and H.-P. Huang, “A mobile robot that understands pedestrian spatial behaviors,” inIntelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on , pp. 5861–5866, IEEE, 2010
work page 2010
-
[36]
Real-time reciprocal collision avoidance with elliptical agents,
A. Best, S. Narang, and D. Manocha, “Real-time reciprocal collision avoidance with elliptical agents,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on , pp. 298–305, IEEE, 2016
work page 2016
-
[37]
Motion planning in dynamic environments using velocity obstacles,
P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” The International Journal of Robotics Re- search, vol. 17, no. 7, pp. 760–772, 1998
work page 1998
-
[38]
The hungarian method for the assignment problem,
H. W. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming 1958-2008 , pp. 29–47, Springer, 2010
work page 1958
-
[39]
M. Levandowsky and D. Winter, “Distance between sets,” Nature, vol. 234, no. 5323, p. 34, 1971
work page 1971
-
[40]
C. Long, A. Haizhou, Z. Zijie, and S. Chong, “Real-time multiple people tracking with deeply learned candidate selection and person re-identification,” in ICME, 2018
work page 2018
-
[41]
Learning to track: Online multi- object tracking by decision making,
Y . Xiang, A. Alahi, and S. Savarese, “Learning to track: Online multi- object tracking by decision making,” in Proceedings of the IEEE international conference on computer vision , pp. 4705–4713, 2015
work page 2015
-
[42]
Tracking the untrackable: Learning to track multiple cues with long-term dependencies,
A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,”
-
[43]
A Hybrid Data Association Framework for Robust Online Multi-Object Tracking
M. Yang, Y . Wu, and Y . Jia, “A hybrid data association framework for robust online multi-object tracking,” arXiv preprint arXiv:1703.10764, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
Online multi-object tracking with convolutional neural networks,
L. Chen, H. Ai, C. Shang, Z. Zhuang, and B. Bai, “Online multi-object tracking with convolutional neural networks,” in Image Processing (ICIP), 2017 IEEE International Conference on , pp. 645–649, IEEE, 2017
work page 2017
-
[45]
Online multi-target tracking with strong and weak detections,
R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro, “Online multi-target tracking with strong and weak detections,” in European Conference on Computer Vision , pp. 84–99, Springer, 2016
work page 2016
-
[46]
Recurrent autoregressive net- works for online multi-object tracking,
K. Fang, Y . Xiang, and S. Savarese, “Recurrent autoregressive net- works for online multi-object tracking,” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pp. 466–475, 2018
work page 2018
-
[47]
Vision meets robotics: The kitti dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research , vol. 32, no. 11, pp. 1231–1237, 2013
work page 2013
-
[48]
Pets 2016: Dataset and challenge,
L. Patino, T. Cane, A. Vallee, and J. Ferryman, “Pets 2016: Dataset and challenge,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pp. 1–8, 2016
work page 2016
-
[49]
Evaluating multiple object tracking performance: the clear mot metrics,
K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” Journal on Image and Video Processing, vol. 2008, p. 1, 2008
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.