Recognition: unknown
Fast Online 3D Multi-Camera Multi-Object Tracking and Pose Estimation
Pith reviewed 2026-05-10 12:09 UTC · model grok-4.3
The pith
An efficient Bayes-optimal filter enables fast 3D multi-object tracking and pose estimation from multiple monocular cameras using only 2D detections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that joint 3D tracking and pose estimation can be accomplished in real time across multiple cameras by an efficient implementation of a Bayes-optimal multi-object tracking filter that operates solely on 2D bounding box and pose detections from publicly available models, delivering speed improvements over state-of-the-art approaches without accuracy loss and maintaining robustness during camera disconnections and reconnections.
What carries the argument
efficient implementation of a Bayes-optimal multi-object tracking filter
If this is right
- The algorithm is significantly faster than state-of-the-art methods.
- Accuracy is maintained without compromise.
- Only publicly available pre-trained 2D detection models are required.
- The system remains robust to intermittent camera disconnections and reconnections.
- It performs online joint 3D multi-object tracking and pose estimation.
Where Pith is reading between the lines
- This suggests 3D capabilities can be added to existing 2D vision systems with minimal overhead.
- The approach may support applications in environments with variable camera availability, such as moving platforms.
- Similar efficiency techniques could benefit other Bayesian tracking problems in computer vision.
Load-bearing premise
Reliable 2D bounding box and pose detections from off-the-shelf models are sufficient to drive accurate 3D multi-object tracking and pose estimation across cameras without additional 3D-specific training or calibration.
What would settle it
Testing the reported speed and accuracy on benchmark multi-camera datasets; failure to exceed state-of-the-art speed or match accuracy would falsify the claims.
Figures
read the original abstract
This paper proposes a fast and online method for jointly performing 3D multi-object tracking and pose estimation using multiple monocular cameras. Our algorithm requires only 2D bounding box and pose detections, eliminating the need for costly 3D training data or computationally expensive deep learning models. Our solution is an efficient implementation of a Bayes-optimal multi-object tracking filter, enhancing computational efficiency while maintaining accuracy. We demonstrate that our algorithm is significantly faster than state-of-the-art methods without compromising accuracy, using only publicly available pre-trained 2D detection models. We also illustrate the robust performance of our algorithm in scenarios where multiple cameras are intermittently disconnected or reconnected during operation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a fast online algorithm for joint 3D multi-object tracking and pose estimation across multiple monocular cameras. It implements an efficient Bayes-optimal multi-object tracking filter that operates exclusively on 2D bounding-box and pose detections from publicly available pre-trained models, eliminating the need for 3D training data or heavy deep-learning inference. The authors claim substantial runtime improvements over prior art while preserving accuracy and demonstrate robustness to intermittent camera disconnections.
Significance. If the performance claims are substantiated with quantitative evidence, the work would offer a practical, calibration-light solution for real-time 3D perception in multi-camera surveillance and robotics settings. The emphasis on using only off-the-shelf 2D detectors and an efficient filter implementation could reduce deployment costs, but the absence of any reported error metrics, runtime numbers, or dataset details in the abstract leaves the significance difficult to evaluate at present.
major comments (3)
- [Abstract] Abstract: The central claim that the method is 'significantly faster than state-of-the-art methods without compromising accuracy' is unsupported by any quantitative results, error metrics (e.g., MOTA, MOTP, pose error), runtime benchmarks, or experimental protocol. This absence directly undermines evaluation of the 'no accuracy loss' assertion.
- [Method] Method section (presumed §3–4): The transition from 2D detections to metric 3D states via the Bayes filter presupposes multi-view geometry. No explicit treatment of camera intrinsics, extrinsics, or online calibration is described, yet the abstract asserts operation 'without ... calibration.' This geometric precondition is load-bearing for the accuracy claim and must be clarified with either a stated assumption or an auxiliary estimation procedure.
- [Experiments] Experimental evaluation (presumed §5): The robustness claim for 'intermittently disconnected or reconnected' cameras requires concrete metrics on tracking continuity and pose drift during camera loss events; without such results the claim remains unverified.
minor comments (2)
- [Abstract] The abstract is unusually long and contains redundant phrasing ('fast and online', 'enhancing computational efficiency while maintaining accuracy'); condensing it would improve readability.
- [Method] Notation for the Bayes filter state vector and measurement model should be introduced with a clear table or diagram early in the method section to aid readers unfamiliar with the specific filter formulation.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. We address each major comment in detail below, providing clarifications from the manuscript and indicating where revisions will be made to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the method is 'significantly faster than state-of-the-art methods without compromising accuracy' is unsupported by any quantitative results, error metrics (e.g., MOTA, MOTP, pose error), runtime benchmarks, or experimental protocol. This absence directly undermines evaluation of the 'no accuracy loss' assertion.
Authors: The abstract provides a high-level summary of the contributions. Quantitative support for the speed and accuracy claims appears in Section 5, which reports runtime benchmarks against prior methods, MOTA/MOTP scores, and pose estimation errors on public multi-camera datasets using only off-the-shelf 2D detectors. To make the abstract self-contained for readers, we will revise it to include one or two key numerical highlights (e.g., average FPS improvement and accuracy parity) while preserving its brevity. revision: yes
-
Referee: [Method] Method section (presumed §3–4): The transition from 2D detections to metric 3D states via the Bayes filter presupposes multi-view geometry. No explicit treatment of camera intrinsics, extrinsics, or online calibration is described, yet the abstract asserts operation 'without ... calibration.' This geometric precondition is load-bearing for the accuracy claim and must be clarified with either a stated assumption or an auxiliary estimation procedure.
Authors: The method relies on standard multi-view geometry with known, fixed camera intrinsics and extrinsics; these are treated as given inputs, consistent with the majority of multi-camera tracking literature. The phrase 'without calibration' in the manuscript refers specifically to the absence of any online or dynamic calibration step and to the elimination of 3D-specific training data or heavy 3D inference. We will add an explicit statement in the revised method section clarifying this assumption and noting that no auxiliary online calibration procedure is required or performed. revision: yes
-
Referee: [Experiments] Experimental evaluation (presumed §5): The robustness claim for 'intermittently disconnected or reconnected' cameras requires concrete metrics on tracking continuity and pose drift during camera loss events; without such results the claim remains unverified.
Authors: Section 5 already includes timing and qualitative tracking continuity results under simulated camera disconnections. To strengthen the claim, we will augment the experimental section with quantitative metrics such as ID-switch rates, track-fragmentation counts, and average pose-error increase during disconnection intervals, computed on the same datasets used for the main evaluation. revision: yes
Circularity Check
No significant circularity; derivation is self-contained implementation of standard filter
full rationale
The paper presents its core contribution as an efficient implementation of a Bayes-optimal multi-object tracking filter that takes 2D bounding box and pose detections from publicly available pre-trained models as input. No equations, predictions, or first-principles results in the abstract or described approach reduce by construction to fitted parameters, self-definitions, or self-citation chains. The accuracy and speed claims are framed as empirical outcomes of applying the standard filter to external detections, without renaming known results or smuggling ansatzes via self-citation. The method remains open to external validation on calibration and geometry assumptions, but these do not create circularity within the derivation itself.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Multi- person 3D pose estimation and tracking in sports,
L. Bridgeman, M. Volino, J.-Y. Guillemaut, and A. Hilton, “Multi- person 3D pose estimation and tracking in sports,” inIEEE Conf. Comput. Vis. Pattern Recog. Workshops, 2019, pp. 2487–2496. 1
2019
-
[2]
Urban traffic surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections,
H. Bradler, A. Kretz, and R. Mester, “Urban traffic surveillance (UTS): A fully probabilistic 3D tracking approach based on 2D detections,” inIEEE Intell. Vehicles Symp., 2021, pp. 1198–1205. 1
2021
-
[3]
Multiple view geometry transformers for 3D human pose estimation,
Z. Liao, J. Zhu, C. Wang, H. Hu, and S. L. Waslander, “Multiple view geometry transformers for 3D human pose estimation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 708–717. 1, 2, 10, 11
2024
-
[4]
YOLOX: Exceeding YOLO Series in 2021
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO series in 2021,”arXiv preprint arXiv:2107.08430, 2021. 1, 2, 8
work page internal anchor Pith review arXiv 2021
-
[5]
Realtime multi- person 2D pose estimation using part affinity fields,
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi- person 2D pose estimation using part affinity fields,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 7291–7299. 1, 2, 3
2017
-
[6]
AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time,
H. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, and C. Lu, “AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 7157–7173, 2022. 1, 2, 8
2022
-
[7]
A Bayesian filter for multi-view 3D multi-object tracking with occlusion handling,
J. Ong, B.-T. Vo, B.-N. Vo, D. Y. Kim, and S. E. Nordholm, “A Bayesian filter for multi-view 3D multi-object tracking with occlusion handling,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2246–2263, 2022. 1, 2, 3, 6, 7, 8
2022
-
[8]
Track initialization and re-identification for 3D multi-view multi-object tracking,
L. V . Ma, T. T. D. Nguyen, B.-N. Vo, H. Jang, and M. Jeon, “Track initialization and re-identification for 3D multi-view multi-object tracking,”Inf. Fusion, p. 102496, 2024. 1, 2, 4, 5, 6, 8
2024
-
[9]
Multi-sensor multi-object track- ing with the generalized labeled multi-Bernoulli filter,
B.-N. Vo, B.-T. Vo, and M. Beard, “Multi-sensor multi-object track- ing with the generalized labeled multi-Bernoulli filter,”IEEE Trans. Signal Process., vol. 67, no. 23, pp. 5952–5967, 2019. 1, 2, 6, 7, 8
2019
-
[10]
A multiview approach to tracking people in crowded scenes using a planar homography constraint,
S. M. Khan and M. Shah, “A multiview approach to tracking people in crowded scenes using a planar homography constraint,” inEur. Conf. Comput. Vis.Springer, 2006, pp. 133–146. 2
2006
-
[11]
Homography based multiple camera detection and tracking of people in a dense crowd,
R. Eshel and Y. Moses, “Homography based multiple camera detection and tracking of people in a dense crowd,” inIEEE Conf. Comput. Vis. Pattern Recog., 2008, pp. 1–8. 2
2008
-
[12]
Deep multi-camera people detec- tion,
T. Chavdarova and F. Fleuret, “Deep multi-camera people detec- tion,” inIEEE Int. Conf. Mach. learning and Appl., 2017, pp. 848–853. 2
2017
-
[13]
Deep occlusion reasoning for multi-camera multi-target detection,
P . Baqu´e, F. Fleuret, and P . V . Fua, “Deep occlusion reasoning for multi-camera multi-target detection,” inIEEE Int. Conf. Comput. Vis., 2017, pp. 271–279. 2, 15
2017
-
[14]
LMGP: Lifted multicut meets geometry projections for multi-camera multi-object tracking,
D. M. H. Nguyen, R. Henschel, B. Rosenhahn, D. Sonntag, and P . Swoboda, “LMGP: Lifted multicut meets geometry projections for multi-camera multi-object tracking,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 8866–8875. 2, 15
2022
-
[15]
EarlyBird: Early-fusion for multi-view tracking in the bird’s eye view,
T. Teepe, P . Wolters, J. Gilg, F. Herzog, and G. Rigoll, “EarlyBird: Early-fusion for multi-view tracking in the bird’s eye view,” in IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 102–111. 2, 15
2024
-
[16]
Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric,
T. Gao, Z. Jia, W. Lin, and Y. Li, “Delving into monocular 3D vehicle tracking: a decoupled framework and a dedicated metric,” Appl. Intell., vol. 53, no. 1, pp. 746–756, 2023. 2
2023
-
[17]
Fast online multi- target multi-camera tracking for vehicles,
K. Shim, K. Ko, J. Hwang, H. Jang, and C. Kim, “Fast online multi- target multi-camera tracking for vehicles,”Appl. Intell., vol. 53, no. 23, pp. 28 994–29 004, 2023. 2
2023
-
[18]
3D pictorial structures for multiple human pose estimation,
V . Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D pictorial structures for multiple human pose estimation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1669–1676. 2, 7, 9, 10, 11
2014
-
[19]
VoxelPose: Towards multi-camera 3d human pose estimation in wild environment,
H. Tu, C. Wang, and W. Zeng, “VoxelPose: Towards multi-camera 3d human pose estimation in wild environment,” inEur. Conf. Comput. Vis.Springer, 2020, pp. 197–212. 2
2020
-
[20]
Faster VoxelPose: Real-time 3D human pose estimation by orthographic projection,
H. Ye, W. Zhu, C. yu Wang, R. Wu, and Y. Wang, “Faster VoxelPose: Real-time 3D human pose estimation by orthographic projection,” inEur. Conf. Comput. Vis.Springer, 2022, pp. 142–159. 2, 10, 11
2022
-
[21]
TesseTrack: End-to-end learnable multi-person ar- ticulated 3D pose tracking,
N. Reddy, L. Guigues, L. Pischulini, J. Eledath, and S. G. Narasimhan, “TesseTrack: End-to-end learnable multi-person ar- ticulated 3D pose tracking,” inIEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 15 190–15 200. 2
2021
-
[22]
Graph-based 3D multi-person pose estimation using multi-view images,
S. Wu, S. Jin, W. Liu, L. Bai, C. Qian, D. Liu, and W. Ouyang, “Graph-based 3D multi-person pose estimation using multi-view images,” inIEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 148– 11 157. 2
2021
-
[23]
Direct multi-view multi-person 3D pose estimation,
T. Wang, J. Zhang, Y. Cai, S. Yan, and J. Feng, “Direct multi-view multi-person 3D pose estimation,”Adv. Neural Inf. Process. Syst., vol. 34, pp. 13 153–13 164, 2021. 2
2021
-
[24]
SelfPose3d: Self- supervised multi-person multi-view 3D pose estimation,
V . K. Srivastav, K. Chen, and N. Padoy, “SelfPose3d: Self- supervised multi-person multi-view 3D pose estimation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 2502–2512. 2, 10, 11
2024
-
[25]
Distribution-aware single-stage models for multi-person 3D pose estimation,
Z. Wang, X. Nie, X. Qu, Y. Chen, and S. Liu, “Distribution-aware single-stage models for multi-person 3D pose estimation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 13 096–13 105. 2, 10, 11 13
2022
-
[26]
Multi-view multi-person 3D pose estimation with plane sweep stereo,
J. Lin and G. H. Lee, “Multi-view multi-person 3D pose estimation with plane sweep stereo,” inIEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 886–11 895. 2, 10, 11
2021
-
[27]
Fast and robust multi-person 3D pose estimation from multiple views,
J. Dong, W. B. Jiang, Q.-X. Huang, H. Bao, and X. Zhou, “Fast and robust multi-person 3D pose estimation from multiple views,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 7792–7801. 2, 10, 11
2019
-
[28]
Multi- person 3D pose estimation in crowded scenes based on multi-view geometry,
H. Chen, P . Guo, P . Li, G. H. Lee, and G. S. Chirikjian, “Multi- person 3D pose estimation in crowded scenes based on multi-view geometry,” inEur. Conf. Comput. Vis.Springer, 2020, pp. 541–557. 2
2020
-
[29]
Distinctive image features from scale-invariant key- points,
D. G. Lowe, “Distinctive image features from scale-invariant key- points,”Int. J. Comput. Vis., vol. 60, pp. 91–110, 2004. 2
2004
-
[30]
Histograms of oriented gradients for human detection,
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” inIEEE Conf. Comput. Vis. Pattern Recog., vol. 1, 2005, pp. 886–893. 2
2005
-
[31]
Rich feature hierarchies for accurate object detection and semantic segmenta- tion,
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmenta- tion,” inIEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 580–587. 2
2014
-
[32]
Fast R-CNN,
R. Girshick, “Fast R-CNN,” inIEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448. 2
2015
-
[33]
Faster R-CNN: Towards real-time object detection with region proposal networks,
S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,”Adv. Neural Inf. Process. Syst, vol. 28, 2015. 2
2015
-
[34]
You only look once: Unified, real-time object detection,
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 779–788. 2
2016
-
[35]
Parsing occluded people by flexible compositions,
X. Chen and A. L. Yuille, “Parsing occluded people by flexible compositions,” inIEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3945–3954. 2
2015
-
[36]
OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,
S. Kreiss, L. Bertoni, and A. Alahi, “OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 13 498–13 511,
-
[37]
Mask R-CNN,
K. He, G. Gkioxari, P . Doll ´ar, and R. Girshick, “Mask R-CNN,” in IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969. 2
2017
-
[38]
Cascaded pyramid network for multi-person pose estimation,
Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded pyramid network for multi-person pose estimation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7103–7112. 2
2018
-
[39]
Simple baselines for human pose estimation and tracking,
B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” inEur. Conf. Comput. Vis., 2018, pp. 466–
2018
-
[40]
Deep high-resolution representation learning for human pose estimation,
K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 5693–5703. 2
2019
-
[41]
Microsoft COCO: Common objects in context,
T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P . Perona, D. Ramanan, P . Doll´ar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” inEur. Conf. Comput. Vis.Springer, 2014, pp. 740–755. 3
2014
-
[42]
2D human pose estimation: New benchmark and state of the art analysis,
M. Andriluka, L. Pishchulin, P . Gehler, and B. Schiele, “2D human pose estimation: New benchmark and state of the art analysis,” in IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 3686–3693. 3
2014
-
[43]
A flexible new technique for camera calibration,
Z. Zhang, “A flexible new technique for camera calibration,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334,
-
[44]
Unscented filtering and nonlinear estimation,
S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,”Proc. IEEE, vol. 92, no. 3, pp. 401–422, 2004. 4
2004
-
[45]
van der Merwe and E
R. van der Merwe and E. Wan,Sigma-point Kalman filters for probabilistic inference in dynamic state-space models. Oregon Health & Science University, 2004. 4, 5
2004
-
[46]
The Hungarian method for the assignment prob- lem,
H. W. Kuhn, “The Hungarian method for the assignment prob- lem,”Nav. Res. Logist. Q., vol. 2, no. 1-2, pp. 83–97, 1955. 4
1955
-
[47]
A shortest augmenting path algo- rithm for dense and sparse linear assignment problems,
R. Jonker and A. Volgenant, “A shortest augmenting path algo- rithm for dense and sparse linear assignment problems,”Comput- ing, vol. 38, no. 4, pp. 325–340, 1987. 4
1987
-
[48]
ByteTrack: Multi-object tracking by associating every detection box,
Y. Zhang, P . Sun, Y. Jiang, D. Yu, Z. Yuan, P . Luo, W. Liu, and X. Wang, “ByteTrack: Multi-object tracking by associating every detection box,” inEur. Conf. Comput. Vis., 2022, pp. 1–21. 5
2022
-
[49]
FairMOT: On the fairness of detection and re-identification in multiple object tracking,
Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the fairness of detection and re-identification in multiple object tracking,”Int. J. Comput. Vis., vol. 129, no. 11, pp. 3069–3087, 2021. 5
2021
-
[50]
Harltey and A
A. Harltey and A. Zisserman,Multiple view geometry in computer vision. Cambridge university press, 2003. 5
2003
-
[51]
Wild- track: A multi-camera hd dataset for dense unscripted pedestrian detection,
T. Chavdarova, P . Baqu ´e, S. Bouquet, A. Maksai, C. Jose, T. M. Bagautdinov, L. Lettry, P . V . Fua, L. V . Gool, and F. Fleuret, “Wild- track: A multi-camera hd dataset for dense unscripted pedestrian detection,”IEEE Conf. Comput. Vis. Pattern Recog., pp. 5030–5039,
-
[52]
Multiview detection with feature perspective transformation,
Y. Hou, L. Zheng, and S. Gould, “Multiview detection with feature perspective transformation,” inEur. Conf. Comput. Vis.Springer, 2020, pp. 1–18. 7
2020
-
[53]
Panoptic Studio: A massively multi- view system for social motion capture,
H. Joo, H. Liu, L. Tan, L. Gui, B. C. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y. Sheikh, “Panoptic Studio: A massively multi- view system for social motion capture,” inIEEE Int. Conf. Comput. Vis., 2015. 7, 9
2015
-
[54]
Evaluating multiple object tracking performance: The CLEAR MOT metrics,
K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The CLEAR MOT metrics,”EURASIP J. on Image and Video Process., vol. 2008, pp. 1–10, 2008. 8
2008
-
[55]
Per- formance measures and a data set for multi-target, multi-camera tracking,
E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Per- formance measures and a data set for multi-target, multi-camera tracking,” inEur. Conf. Comput. Vis.Springer, 2016, pp. 17–35. 8
2016
-
[56]
A solution for large-scale multi- object tracking,
M. Beard, B.-T. Vo, and B.-N. Vo, “A solution for large-scale multi- object tracking,”IEEE Trans. Signal Process., vol. 68, pp. 2754–2769,
-
[57]
How trustworthy are the existing performance evaluations for basic vision tasks?
T. T. D. Nguyen, H. Rezatofighi, B.-N. Vo, B.-T. Vo, S. Savarese, and I. Reid, “How trustworthy are the existing performance evaluations for basic vision tasks?”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 7, pp. 8538–8552, 2023. 8
2023
-
[58]
Generalized intersection over union: A metric and a loss for bounding box regression,
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 658–666. 8
2019
-
[59]
Multiple human pose estimation with temporally consistent 3D pictorial structures,
V . Belagiannis, X. Wang, B. Schiele, P . V . Fua, S. Ilic, and N. Navab, “Multiple human pose estimation with temporally consistent 3D pictorial structures,” inEur. Conf. Comput. Vis. Workshops. Springer, 2015, pp. 742–754. 10, 11
2015
-
[60]
3D pictorial structures revisited: Multiple human pose estimation,
V . Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D pictorial structures revisited: Multiple human pose estimation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 1929–1942, 2015. 10, 11
1929
-
[61]
Multiple human 3D pose estimation from multiview images,
S. Ershadi-Nasab, E. Noury, S. Kasaei, and E. Sanaei, “Multiple human 3D pose estimation from multiview images,”Multimed. Tools Appl., vol. 77, pp. 15 573–15 601, 2018. 10, 11
2018
-
[62]
Labeled random finite sets and multi-object conjugate priors,
B.-T. Vo and B.-N. Vo, “Labeled random finite sets and multi-object conjugate priors,”IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3460–3475, 2013. 15
2013
-
[63]
Design and analysis of modern tracking systems,
S. Blackman and R. Populi, “Design and analysis of modern tracking systems,”Norwood, MA: Artech House, 1999., 1999. 15
1999
-
[64]
3D random occlu- sion and multi-layer projection for deep multi-camera pedestrian localization,
R. Qiu, M. Xu, Y. Yan, J. S. Smith, and X. Yang, “3D random occlu- sion and multi-layer projection for deep multi-camera pedestrian localization,” inEur. Conf. Comput. Vis., 2022. 15 14 Appendices for “Fast Online 3D Multi-Camera Multi-Object Tracking and Pose Estimation” Linh Van Ma, Tran Thien Dat Nguyen, and Moongu Jeon APPENDIXA PSEUDOCODES FORIMPORTA...
2022
-
[65]
We also test the KSP-ptracker (K.p.) [51] using 3D detections from DeepOcclusion (DeepOcc.) detector [13], LGMP [14] and EarlyBird [15] 3D trackers
to process 3D detections obtained from 3DROM de- tector [64]. We also test the KSP-ptracker (K.p.) [51] using 3D detections from DeepOcclusion (DeepOcc.) detector [13], LGMP [14] and EarlyBird [15] 3D trackers. We note that 3DROM, DeepOcc. and LGMP models are trained on 90% of the WT dataset and evaluated on the remaining 10%. The EarlyBird tracker was tr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.