Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation
Pith reviewed 2026-05-21 22:08 UTC · model grok-4.3
The pith
Any online 2D multi-camera tracking system extends to 3D by reconstructing targets as point clouds from depth data and recovering boxes via clustering and yaw refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The approach extends any online 2D multi-camera tracking system into 3D space by utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking. It also introduces an enhanced online data association mechanism that leverages the target's local ID consistency to assign global IDs across frames. The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset.
What carries the argument
Depth-based late aggregation that reconstructs 2D-tracked targets into point clouds and recovers 3D boxes through clustering and yaw refinement.
If this is right
- Existing 2D MTMC systems gain 3D perception capability without replacing all tracking components.
- Targets are reconstructed in point-cloud space to support automatic 3D environment perception.
- Local ID consistency enables improved global ID assignment across frames in online operation.
- The framework delivers competitive performance on 3D MTMC benchmarks such as the AI City Challenge.
Where Pith is reading between the lines
- The modular late-aggregation design could pair with multiple different 2D trackers to match specific camera hardware in surveillance networks.
- Reusing 2D models in this way may reduce development and compute costs compared to building dedicated 3D trackers from scratch.
- The method could extend to real-time applications in smart-city monitoring or traffic analysis if depth sources vary between sensors and stereo estimation.
Load-bearing premise
Depth information is available and accurate enough to enable reliable point-cloud reconstruction of targets followed by clustering and yaw refinement to recover valid 3D boxes.
What would settle it
Applying the method to test scenes with noisy or missing depth maps and measuring whether recovered 3D boxes show large deviations from ground-truth 3D annotations on the AI City Challenge dataset.
Figures
read the original abstract
Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance. With camera calibration and depth information, the targets in the scene can be projected into 3D space, offering unparalleled levels of automatic perception of a 3D environment. However, tracking in the 3D space requires replacing all 2D tracking components from the ground up, which may be infeasible for existing MTMC systems. In this paper, we present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking. We also introduced an enhanced online data association mechanism that leverages the target's local ID consistency to assign global IDs across frames. The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to extend any online 2D multi-camera tracking system to 3D perception by reconstructing tracked targets into point clouds using depth information, then applying clustering and yaw refinement to recover 3D bounding boxes; it further introduces an enhanced online data association step that exploits local ID consistency for global ID assignment across frames. The framework is evaluated on the 2025 AI City Challenge 3D MTMC dataset and reports a 3rd-place ranking.
Significance. If the central claim holds, the approach offers a practical, modular upgrade path for existing 2D MTMC pipelines to 3D without replacing core tracking components, which is relevant for large-scale surveillance. The reported 3rd-place result on the challenge dataset supplies concrete empirical support for the overall pipeline. However, the absence of targeted robustness analysis on the depth-to-3D conversion step limits the strength of the evidence for real-world deployment.
major comments (2)
- [Method description of 3D box recovery] The depth-based late aggregation (point-cloud reconstruction followed by clustering and yaw refinement) is the sole mechanism converting 2D tracks into 3D boxes and is therefore load-bearing. No quantitative characterization or ablation is provided on how depth noise, sensor inaccuracies, or partial occlusions affect under-/over-segmentation or yaw stability, leaving the central claim without direct verification.
- [Experiments and results] Table or leaderboard results report 3rd place but contain no component ablations isolating the contribution of the proposed clustering/yaw refinement versus the base 2D tracker, nor any error analysis broken down by depth quality or occlusion level. This makes it impossible to confirm that the 3D extension itself drives the ranking rather than upstream 2D performance.
minor comments (2)
- [Abstract and §3] The abstract and method section would benefit from explicit naming of the clustering algorithm (e.g., DBSCAN parameters) and the exact yaw refinement procedure to allow reproduction.
- [Data association subsection] Notation for local versus global IDs is introduced but not consistently defined in equations or pseudocode, which reduces clarity of the data-association enhancement.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments correctly identify areas where additional analysis would strengthen the evidence for the proposed 3D extension. We address each major comment below and will incorporate revisions to improve the paper.
read point-by-point responses
-
Referee: [Method description of 3D box recovery] The depth-based late aggregation (point-cloud reconstruction followed by clustering and yaw refinement) is the sole mechanism converting 2D tracks into 3D boxes and is therefore load-bearing. No quantitative characterization or ablation is provided on how depth noise, sensor inaccuracies, or partial occlusions affect under-/over-segmentation or yaw stability, leaving the central claim without direct verification.
Authors: We agree that a targeted robustness study on the depth-to-3D conversion would provide stronger verification of the central claim. The current evaluation relies on the real-world variations present in the 2025 AI City Challenge 3D MTMC dataset, which includes diverse depth qualities and occlusion scenarios as reflected in the 3rd-place result. To directly address the concern, we will add a new subsection with controlled experiments injecting synthetic depth noise and simulating partial occlusions to measure effects on clustering stability and yaw estimation. revision: yes
-
Referee: [Experiments and results] Table or leaderboard results report 3rd place but contain no component ablations isolating the contribution of the proposed clustering/yaw refinement versus the base 2D tracker, nor any error analysis broken down by depth quality or occlusion level. This makes it impossible to confirm that the 3D extension itself drives the ranking rather than upstream 2D performance.
Authors: We concur that explicit component ablations and stratified error analysis are needed to isolate the contribution of the late-aggregation steps. In the revised manuscript we will include an ablation table comparing the full pipeline against a baseline that projects 2D tracks to 3D without clustering or yaw refinement. We will also add error breakdowns stratified by depth quality (using available sensor metadata) and occlusion level (derived from track visibility annotations) to demonstrate the incremental benefit of the proposed 3D components. revision: yes
Circularity Check
No circularity: standard depth-to-3D pipeline after 2D tracking
full rationale
The paper's core chain is: run any existing online 2D MTMC tracker, project tracked targets into point-cloud space using supplied depth, apply clustering plus yaw refinement to obtain 3D boxes, and use local-ID consistency for global association. None of these steps is defined in terms of its own output, fitted to a subset and then re-predicted, or justified solely by a self-citation whose content is unverified. The method is presented as a modular post-processing extension whose validity is checked by leaderboard performance on the 2025 AI City 3D MTMC dataset. No equations or uniqueness theorems reduce the result to the input by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Depth information from cameras is available and sufficiently accurate for point-cloud reconstruction of tracked targets.
- domain assumption Existing 2D multi-camera tracking produces sufficiently robust local IDs and trajectories to support subsequent 3D aggregation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DBSCAN clustering ... epsilon ... min_samples=50 ... volume-based fusion ... yaw = arctan((yt - yt-10)/(xt - xt-10))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Simple online and realtime tracking
Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. Simple online and realtime tracking. In2016 IEEE international conference on image processing (ICIP), pages 3464–3468. Ieee, 2016. 2
work page 2016
-
[2]
M3d-rpn: Monocular 3d region proposal network for object detection
Garrick Brazil and Xiaoming Liu. M3d-rpn: Monocular 3d region proposal network for object detection. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 9287–9296, 2019. 2
work page 2019
-
[3]
Cascade r-cnn: Delv- ing into high quality object detection
Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delv- ing into high quality object detection. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 6154–6162, 2018. 2
work page 2018
-
[4]
Observation-centric sort: Rethink- ing sort for robust multi-object tracking
Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khi- rodkar, and Kris Kitani. Observation-centric sort: Rethink- ing sort for robust multi-object tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9686–9696, 2023. 2, 4
work page 2023
-
[5]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 2
work page 2020
-
[6]
Dsgn: Deep stereo geometry network for 3d object detection
Yilun Chen, Shu Liu, Xiaoyong Shen, and Jiaya Jia. Dsgn: Deep stereo geometry network for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 12536–12545, 2020. 2
work page 2020
-
[7]
Monopair: Monocular 3d object detection using pairwise spatial relationships
Yongjian Chen, Lei Tai, Kai Sun, and Mingyang Li. Monopair: Monocular 3d object detection using pairwise spatial relationships. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 12093–12102, 2020. 2
work page 2020
-
[8]
Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking
Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, and Shang-Hong Lai. Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10051–10060, 2023. 2
work page 2023
-
[9]
Riu Cherdchusakulchai, Sasin Phimsiri, Visarut Trairat- tanapa, Suchat Tungjitnob, Wasu Kudisthalert, Pornprom Ki- awjak, Ek Thamwiwatthana, Phawat Borisuitsawat, Teep- akorn Tosawadi, Pakcheera Choppradit, et al. Online multi- camera people tracking with spatial-temporal mechanism and anchor-feature hierarchical clustering. InProceedings of the IEEE/CVF ...
work page 2024
-
[10]
Deepstereo: Learning to predict new views from the world’s imagery
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. Deepstereo: Learning to predict new views from the world’s imagery. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5515–5524,
-
[11]
Yolox: Exceeding yolo series in 2021, 2021
Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021, 2021. 2
work page 2021
-
[12]
Ross Girshick. Fast r-cnn. InProceedings of the IEEE inter- national conference on computer vision, pages 1440–1448,
-
[13]
Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 2
work page 2017
-
[14]
Hsiang-Wei Huang, Cheng-Yen Yang, Zhongyu Jiang, Pyong-Kun Kim, Kyoungoh Lee, Kwangju Kim, Samartha Ramkumar, Chaitanya Mullapudi, In-Su Jang, Chung-I Huang, et al. Enhancing multi-camera people tracking with anchor-guided clustering and spatio-temporal consistency id re-assignment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern R...
work page 2023
-
[15]
Rabah Iguernaissi, Djamal Merad, Kheireddine Aziz, and Pierre Drap. People tracking in multi-camera systems: a re- view.Multimedia Tools and Applications, 78:10773–10793,
-
[16]
Rtmpose: Real-time multi-person pose estimation based on mmpose
Tao Jiang, Peng Lu, Li Zhang, Ningsheng Ma, Rui Han, Chengqi Lyu, Yining Li, and Kai Chen. Rtmpose: Real- time multi-person pose estimation based on mmpose.arXiv preprint arXiv:2303.07399, 2023. 4
- [17]
-
[18]
Addressing the occlusion problem in multi-camera people tracking with human pose estimation
Jeongho Kim, Wooksu Shin, Hancheol Park, and Jongwon Baek. Addressing the occlusion problem in multi-camera people tracking with human pose estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5463–5469, 2023. 2, 4
work page 2023
-
[19]
Cluster self-refinement for enhanced online multi- camera people tracking
Jeongho Kim, Wooksu Shin, Hancheol Park, and Donghyuk Choi. Cluster self-refinement for enhanced online multi- camera people tracking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7190–7197, 2024. 1, 2, 4, 8
work page 2024
-
[20]
Pointpillars: Fast encoders for object detection from point clouds
Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 3
work page 2019
-
[21]
xformers: A modular and hackable trans- former modelling library.https : / / github
Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Wenhan Xiong, Vittorio Caggiano, Sean Naren, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang, Patrick Labatut, Daniel Haziza, Luca Wehrstedt, Jeremy Reizenstein, and Grigory Sizov. xformers: A modular and hackable trans- former modelling library.https : / / github . com / facebookresearch/xformers, 2022. 7
work page 2022
-
[22]
Crowdpose: Efficient crowded scenes pose estimation and a new benchmark
Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10863–10872, 2019. 4
work page 2019
-
[23]
Stereo r-cnn based 3d object detection for autonomous driving
Peiliang Li, Xiaozhi Chen, and Shaojie Shen. Stereo r-cnn based 3d object detection for autonomous driving. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7644–7652, 2019. 2
work page 2019
-
[24]
Rtm3d: Real-time monocular 3d detection from object key- points for autonomous driving
Peixuan Li, Huaici Zhao, Pengfei Liu, and Feidao Cao. Rtm3d: Real-time monocular 3d detection from object key- points for autonomous driving. InEuropean Conference on Computer Vision, pages 644–660. Springer, 2020. 2
work page 2020
-
[25]
Clip-reid: exploiting vision-language model for image re-identification without concrete text labels
Siyuan Li, Li Sun, and Qingli Li. Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. InProceedings of the AAAI conference on artificial intelligence, pages 1405–1413, 2023. 2, 4, 7
work page 2023
-
[26]
Exploring plain vision transformer backbones for object de- tection
Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object de- tection. InEuropean conference on computer vision, pages 280–296. Springer, 2022. 2
work page 2022
-
[27]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2024. 3
work page 2024
-
[28]
Monodetrnext: Next-generation accurate and efficient monocular 3d object detector, 2024
Pan Liao, Feng Yang, Di Wu, Wenhui Zhao, and Jinwen Yu. Monodetrnext: Next-generation accurate and efficient monocular 3d object detector, 2024. 3
work page 2024
-
[29]
Smoke: Single- stage monocular 3d object detection via keypoint estimation
Zechen Liu, Zizhang Wu, and Roland T ´oth. Smoke: Single- stage monocular 3d object detection via keypoint estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 996–997,
-
[30]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows . In 2021 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 9992–10002, Los Alamitos, CA, USA,
work page 2021
-
[31]
IEEE Computer Society. 7
-
[32]
Geometry uncer- tainty projection network for monocular 3d object detection
Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, and Wanli Ouyang. Geometry uncer- tainty projection network for monocular 3d object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3111–3121, 2021. 2
work page 2021
-
[33]
Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taix´e, and Bastian Leibe. Hota: A higher order metric for evaluating multi-object tracking.International journal of computer vision, 129:548– 578, 2021. 7
work page 2021
-
[34]
Rt-detr: Real-time detection transformer with efficient hybrid encoder, 2024
Wenyu Lv, Yuxiang Chen, Xinghao Chen, Shangliang Xu, Yifan Xiao, Yizhen Gan, Lei Qi, Jinwei Chen, and Jianfeng He. Rt-detr: Real-time detection transformer with efficient hybrid encoder, 2024. 2
work page 2024
-
[35]
Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification
Gerard Maggiolino, Adnan Ahmad, Jinkun Cao, and Kris Kitani. Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. In2023 IEEE International conference on image processing (ICIP), pages 3025–3029. IEEE, 2023. 2, 4
work page 2023
-
[36]
Lmgp: Lifted mul- ticut meets geometry projections for multi-camera multi- object tracking
Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, and Paul Swoboda. Lmgp: Lifted mul- ticut meets geometry projections for multi-camera multi- object tracking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8866– 8875, 2022. 2
work page 2022
-
[37]
Multi-camera people tracking with mixture of realistic and synthetic knowledge
Quang Qui-Vinh Nguyen, Huy Dinh-Anh Le, Truc Thi- Thanh Chau, Duc Trung Luu, Nhat Minh Chung, and Synh Viet-Uyen Ha. Multi-camera people tracking with mixture of realistic and synthetic knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023. 2
work page 2023
-
[38]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d
Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020. 3
work page 2020
-
[39]
Sam 2: Segment anything in images and videos,
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos,
-
[40]
Categorical depth distribution network for monocular 3d object detection
Cody Reading, Ali Harakeh, Julia Chae, and Steven L Waslander. Categorical depth distribution network for monocular 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8555–8564, 2021. 2
work page 2021
-
[41]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015. 2
work page 2015
-
[42]
Disen- tangling monocular 3d object detection
Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel L ´opez-Antequera, and Peter Kontschieder. Disen- tangling monocular 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 1991–1999, 2019. 2
work page 1991
-
[43]
Cameltrack: Context-aware multi-cue exploitation for online multi-object tracking, 2025
Vladimir Somers, Baptiste Standaert, Victor Joos, Alexan- dre Alahi, and Christophe De Vleeschouwer. Cameltrack: Context-aware multi-cue exploitation for online multi-object tracking, 2025. 2
work page 2025
-
[44]
Ocmctrack: Online multi-target multi- camera tracking with corrective matching cascade
Andreas Specker. Ocmctrack: Online multi-target multi- camera tracking with corrective matching cascade. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7236–7244, 2024. 2
work page 2024
-
[45]
Toward accurate on- line multi-target multi-camera tracking in real-time
Andreas Specker and J ¨urgen Beyerer. Toward accurate on- line multi-target multi-camera tracking in real-time. In2022 30th European Signal Processing Conference (EUSIPCO), pages 533–537. IEEE, 2022. 2
work page 2022
-
[46]
Zheng Tang, Shuo Wang, David C. Anastasiu, Ming- Ching Chang, Anuj Sharma, Quan Kong, Norimasa Ko- bori, Munkhjargal Gochoo, Ganzorig Batnasan, Munkh- Erdene Otgonbold, Fady Alnajjar, Jun-Wei Hsieh, Tomasz Kornuta, Xiaolong Li, Yilin Zhao, Han Zhang, Subhashree Radhakrishnan, Arihant Jain, Ratnesh Kumar, Vidya N. Murali, Yuxing Wang, Sameer Satish Pusegao...
work page 2025
-
[47]
Earlybird: Early-fusion for multi- view tracking in the bird’s eye view
Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Her- zog, and Gerhard Rigoll. Earlybird: Early-fusion for multi- view tracking in the bird’s eye view. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 102–111, 2024. 1
work page 2024
-
[48]
Advancing thermal multi-object tracking with attention and metric fu- sion, 2024
Thao-Anh Tran, Vu-Minh Le, Thanh-Tung Phan, Dung Hoang, Duc Phan, Huong Ninh, and Hai Tran. Advancing thermal multi-object tracking with attention and metric fu- sion, 2024. 2
work page 2024
-
[49]
Yolov8: A novel object detection algorithm with enhanced performance and robust- ness
Rejin Varghese and Sambath M. Yolov8: A novel object detection algorithm with enhanced performance and robust- ness. In2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pages 1–6, 2024. 2
work page 2024
-
[50]
Pointpainting: Sequential fusion for 3d object de- tection
Sourabh V ora, Alex H Lang, Bassam Helou, and Oscar Bei- jbom. Pointpainting: Sequential fusion for 3d object de- tection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4604–4612,
-
[51]
Pointaugmenting: Cross-modal augmentation for 3d object detection
Chunwei Wang, Chao Ma, Ming Zhu, and Xiaokang Yang. Pointaugmenting: Cross-modal augmentation for 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11794– 11803, 2021. 3
work page 2021
-
[52]
Db- scan: Optimal rates for density based clustering.arXiv: Statistics Theory, 2017
Daren Wang, Xin Yang Lu, and Alessandro Rinaldo. Db- scan: Optimal rates for density based clustering.arXiv: Statistics Theory, 2017. 6
work page 2017
-
[53]
Anastasiu, Zheng Tang, Ming- Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S
Shuo Wang, David C. Anastasiu, Zheng Tang, Ming- Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Ko- bori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Gan- zorig Batnasan, Fady Alnajjar, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaon...
work page 2024
-
[54]
Fcos3d: Fully convolutional one-stage monocular 3d object detection
Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. Fcos3d: Fully convolutional one-stage monocular 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 913–922, 2021. 2
work page 2021
-
[55]
Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hari- haran, Mark Campbell, and Kilian Q Weinberger. Pseudo- lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8445–8453, 2019. 2
work page 2019
-
[56]
Yizhou Wang, Tim Meinhardt, Orcun Cetintas, Cheng- Yen Yang, Sameer Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang, and Laura Leal-Taix ´e. Bev- sushi: Multi-target multi-camera 3d detection and tracking in bird’s-eye view.arXiv preprint arXiv:2412.00692, 2024. 1, 2
-
[57]
Simple online and realtime tracking with a deep association metric
Nicolai Wojke, Alex Bewley, and Dietrich Paulus. Simple online and realtime tracking with a deep association metric. In2017 IEEE international conference on image processing (ICIP), pages 3645–3649. IEEE, 2017. 2
work page 2017
-
[58]
Chong Xiang, Alexander Valtchanov, Saeed Mahloujifar, and Prateek Mittal. ObjectSeeker: Certifiably Robust Object Detection against Patch Hiding Attacks via Patch-agnostic Masking . In2023 IEEE Symposium on Security and Pri- vacy (SP), pages 1329–1347, Los Alamitos, CA, USA, 2023. IEEE Computer Society. 7
work page 2023
-
[59]
Zhenyu Xie, Zelin Ni, Wenjie Yang, Yuang Zhang, Yi- hang Chen, Yang Zhang, and Xiao Ma. A robust online multi-camera people tracking system with geometric con- sistency and state-aware re-id correction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7007–7016, 2024. 1, 2
work page 2024
-
[60]
Cheng-Yen Yang, Hsiang-Wei Huang, Pyong-Kun Kim, Zhongyu Jiang, Kwang-Ju Kim, Chung-I Huang, Haiqing Du, and Jenq-Neng Hwang. An online approach and evalua- tion method for tracking people across cameras in extremely long video sequence. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 7037–7045, 2024. 2
work page 2024
-
[61]
City-scale multi-camera vehicle tracking based on space-time-appearance features
Hui Yao, Zhizhao Duan, Zhen Xie, Jingbo Chen, Xi Wu, Duo Xu, and Yutao Gao. City-scale multi-camera vehicle tracking based on space-time-appearance features. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3310–3318, 2022. 2
work page 2022
-
[62]
Overlap suppression clustering for offline multi-camera people tracking
Ryuto Yoshida, Junichi Okubo, Junichiro Fujii, Masazumi Amakata, and Takayoshi Yamashita. Overlap suppression clustering for offline multi-camera people tracking. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7153–7162, 2024. 1, 2
work page 2024
-
[63]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection.arXiv preprint arXiv:2203.03605, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[64]
Monodetr: Depth- guided transformer for monocular 3d object detection
Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Ziteng Cui, Yu Qiao, Hongsheng Li, and Peng Gao. Monodetr: Depth- guided transformer for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9155–9166, 2023. 2
work page 2023
-
[65]
Objects are differ- ent: Flexible monocular 3d object detection
Yunpeng Zhang, Jiwen Lu, and Jie Zhou. Objects are differ- ent: Flexible monocular 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 3289–3298, 2021. 2
work page 2021
-
[66]
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. Fairmot: On the fairness of detection and re-identification in multiple object tracking.International Journal of Computer Vision, 129(11):3069–3087, 2021. 2
work page 2021
-
[67]
Bytetrack: Multi-object tracking by associating every detection box, 2022
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by associating every detection box, 2022. 2
work page 2022
-
[68]
Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC Project
Zhimeng Zhang, Jianan Wu, Xuan Zhang, and Chi Zhang. Multi-target, multi-camera tracking by hierarchical cluster- ing: Recent progress on dukemtmc project.arXiv preprint arXiv:1712.09531, 2017. 2
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[69]
Xingyi Zhou, Dequan Wang, and Philipp Kr ¨ahenb¨uhl. Ob- jects as points, 2019. 2
work page 2019
-
[70]
Deformable detr: Deformable transformers for end-to-end object detection, 2020
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection, 2020. 2
work page 2020
-
[71]
Detrs with col- laborative hybrid assignments training
Zhuofan Zong, Guanglu Song, and Yu Liu. Detrs with col- laborative hybrid assignments training. InProceedings of the IEEE/CVF international conference on computer vision, pages 6748–6758, 2023. 2, 4
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.