pith. sign in

arxiv: 2605.22997 · v1 · pith:LGXKUFPUnew · submitted 2026-05-21 · 💻 cs.CV

Scene Reconstruction as Mapping Priors for 3D Detection

Pith reviewed 2026-05-25 05:41 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D object detectionmapping priorsscene reconstructionautonomous drivingsensor fusionWaymo Open Dataset
0
0 comments X

The pith

Reconstructed scene maps from aggregated sensors serve as priors that improve 3D object detection by resolving ambiguities in sparse or noisy data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that mapping data, typically reserved for planning, supplies structural priors that correct sensor limitations during 3D detection, especially at distance or in bad weather. It presents an automatic pipeline that builds dense mapping priors from collected sensor streams without any human labeling step. These priors are then fed into a Mapping Priors Augmented 3D Detection framework that fuses them with camera or LiDAR inputs. On the Waymo Open Dataset the combined system reaches new state-of-the-art detection scores. The result indicates that scalable, label-free scene reconstructions can replace costly hand-crafted HD maps for perception tasks.

Core claim

Automatically constructed dense mapping priors, derived from aggregated sensor data, can be integrated with sensor modalities inside the MPA3D framework to produce state-of-the-art 3D detection performance on the Waymo Open Dataset by supplying static environmental structure that resolves sensor ambiguities.

What carries the argument

The Mapping Priors Augmented 3D Detection (MPA3D) framework that fuses reconstructed scene priors with different sensor inputs.

If this is right

  • Detection accuracy rises for distant objects where sensor returns are sparse.
  • Performance holds up better under adverse weather that degrades raw sensor quality.
  • Large-scale deployment no longer requires manual creation or maintenance of HD maps.
  • The same reconstructed priors can be reused across multiple perception modules without extra labeling cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on other autonomous-driving datasets to check whether the prior benefit generalizes beyond Waymo.
  • If priors are updated online from recent drives, detection might adapt to slow environmental changes such as construction.
  • Combining the priors with online mapping systems might reduce the need for high-resolution sensors in some regimes.

Load-bearing premise

Dense mapping priors built automatically from sensor data can be added to detection models without introducing new errors or biases that cancel the intended gains.

What would settle it

Run the identical detector on the same Waymo validation scenes once with the mapping priors and once without them; if the mAP or range-specific metrics show no consistent improvement when priors are present, the central claim fails.

Figures

Figures reproduced from arXiv: 2605.22997 by Chen Song, Dragomir Anguelov, Govind Thattai, Hao Xiang, Mingxing Tan, Weijing Shi, Xin Huang, Yang Fu, Yijing Bai, Yingwei Li, Yuliang Zou.

Figure 1
Figure 1. Figure 1: Scene Reconstruction as Mapping Priors for 3D Detection. (Left) Standard 3D detection using only sparse LiDAR data struggles to identify and distinguish multiple vehicles (yellow boxes) amidst background noise. (Right) Our method fuses the sparse LiDAR with dense, reconstructed mapping priors, providing rich context that resolves ambiguities (green boxes) and achieves accurate 3D detection. Abstract In aut… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of MPA3D. The framework processes multi-view camera images via a ResNet backbone and Lift-Splat-Shoot [40] to generate BEV representations. Simultaneously, LiDAR point clouds and mapping priors (e.g., surfels and 3DGS) are encoded using PointMLPs. A novel Gated Fusion Module adaptively integrates these sparse 3D modalities through hierarchical, element-wise gating operations. Finally, the fused ge… view at source ↗
read the original abstract

In autonomous driving, mapping is critical for motion planning but remains an under-utilized resource for perception tasks such as 3D object detection. Maps can provide robust structural priors of the static environment, helping resolve ambiguities and correct for sensor data sparsity or noise, especially for distant objects or under adverse weather conditions. However, conventional High-Definition (HD) maps are resource-intensive to obtain and maintain, which presents a challenge for efficient, large-scale deployment. In this paper, we propose a scalable solution to systematically leverage mapping to improve 3D detection by overcoming two primary challenges. First, we introduce a pipeline to automatically build dense mapping priors from aggregated sensor data, eliminating the need for human labeling. Second, we design a novel Mapping Priors Augmented 3D Detection (MPA3D) framework to effectively integrate mapping priors with different sensor modalities. Extensive experiments on the Waymo Open Dataset demonstrate that our approach achieves new state-of-the-art results, proving the effectiveness of scalable reconstructed scene priors for enhancing 3D detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a pipeline to automatically build dense mapping priors from aggregated sensor data without human labeling, and introduces the Mapping Priors Augmented 3D Detection (MPA3D) framework to integrate these priors with sensor modalities. It claims this overcomes challenges with conventional HD maps and achieves new state-of-the-art results on the Waymo Open Dataset for 3D object detection.

Significance. If the central claims were substantiated with verifiable experiments, the work could have significance for scalable perception in autonomous driving by reducing dependence on labor-intensive HD maps and improving robustness under sensor sparsity or adverse conditions. However, the provided abstract contains no technical details, equations, ablations, or quantitative evidence, so significance cannot be assessed.

major comments (1)
  1. [Abstract] Abstract: the claim that the approach 'achieves new state-of-the-art results' on Waymo supplies no methods, ablation studies, error bars, dataset details, or quantitative tables, so the data cannot be verified to support the claim as stated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments. Below we address the major comment point by point.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the approach 'achieves new state-of-the-art results' on Waymo supplies no methods, ablation studies, error bars, dataset details, or quantitative tables, so the data cannot be verified to support the claim as stated.

    Authors: The abstract is intentionally concise and summarizes the key contribution and outcome. The full manuscript provides the requested details: Section 3 describes the MPA3D framework and mapping-prior construction pipeline with equations; Section 4.1 specifies the Waymo Open Dataset splits and evaluation protocol; Section 4.2 presents quantitative tables comparing against prior methods with mAP and mAPH metrics; Section 4.3 contains ablation studies; and error bars are reported where statistical variation is assessed. These sections directly substantiate the state-of-the-art claim. revision: no

Circularity Check

0 steps flagged

No significant circularity; no derivation chain present to inspect

full rationale

The supplied abstract and manuscript placeholder contain no equations, parameter fits, self-citations, or claimed derivations of any kind. Without visible technical steps that could reduce to inputs by construction, none of the enumerated circularity patterns can be exhibited. The paper's central claims are empirical SOTA results on Waymo; these are not shown to be forced by any internal definition or self-referential loop. This is the normal honest finding when no load-bearing mathematical content is supplied.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no equations, parameters, or explicit assumptions, so the ledger remains empty.

pith-pipeline@v0.9.0 · 5739 in / 1125 out tokens · 23859 ms · 2026-05-25T05:41:08.741151+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 3 internal anchors

  1. [1]

    Mad: Memory-augmented detection of 3d objects

    Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, and Raquel Urtasun. Mad: Memory-augmented detection of 3d objects. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1449–1460, 2025. 3, 7

  2. [2]

    Bevmap: Map-aware bev modeling for 3d perception

    Mincheol Chang, Seokha Moon, Reza Mahjourian, and Jinkyu Kim. Bevmap: Map-aware bev modeling for 3d perception. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7419–7428, 2024. 2

  3. [3]

    Maps for autonomous driving: Full-process survey and frontiers.arXiv preprint arXiv:2509.12632, 2025

    Pengxin Chen, Zhipeng Luo, Xiaoqi Jiang, Zhangcai Yin, and Jonathan Li. Maps for autonomous driving: Full-process survey and frontiers.arXiv preprint arXiv:2509.12632, 2025. 2

  4. [4]

    Mppnet: Multi-frame feature intertwining with proxy points for 3d temporal object detection

    Xuesong Chen, Shaoshuai Shi, Benjin Zhu, Ka Chun Cheung, Hang Xu, and Hongsheng Li. Mppnet: Multi-frame feature intertwining with proxy points for 3d temporal object detection. InEuropean Conference on Computer Vision, pages 680–697. Springer, 2022. 3, 7

  5. [5]

    Point transformer.IEEE Access, 2021

    Nico Engel, Vasileios Belagiannis, and Klaus Dietmayer. Point transformer.IEEE Access, 2021. 2

  6. [6]

    Embracing Single Stride 3D Object Detector with Sparse Transformer

    Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Embracing Single Stride 3D Object Detector with Sparse Transformer. InCVPR, 2022. 6

  7. [7]

    Fully Sparse 3D Object Detection

    Lue Fan, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Fully Sparse 3D Object Detection. InNeurIPS, 2022. 6, 7

  8. [8]

    Fsd v2: Improving fully sparse 3d object detection with virtual voxels.arXiv preprint arXiv:2308.03755, 2023

    Lue Fan, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Fsd v2: Improving fully sparse 3d object detection with virtual voxels.arXiv preprint arXiv:2308.03755, 2023. 6

  9. [9]

    Strobe: Streaming object detection from lidar packets

    Davi Frossard, Shun Da Suo, Sergio Casas, James Tu, and Raquel Urtasun. Strobe: Streaming object detection from lidar packets. InConference on Robot Learning, pages 1174–1183. PMLR, 2021. 2

  10. [10]

    Exploring recurrent long-term temporal fusion for multi-view 3d perception.IEEE Robotics and Automation Letters, 9(7):6544–6551, 2024

    Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, Hongyu Zhou, Weixin Mao, Yuang Peng, and Xiangyu Zhang. Exploring recurrent long-term temporal fusion for multi-view 3d perception.IEEE Robotics and Automation Letters, 9(7):6544–6551, 2024. 2

  11. [11]

    Msf: Motion-guided sequential fusion for efficient 3d object detection from point cloud sequences

    Chenhang He, Ruihuang Li, Yabin Zhang, Shuai Li, and Lei Zhang. Msf: Motion-guided sequential fusion for efficient 3d object detection from point cloud sequences. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5196–5205, 2023. 3, 7

  12. [12]

    Lef: Late-to-early temporal fusion for lidar 3d object detection

    Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, and Mingxing Tan. Lef: Late-to-early temporal fusion for lidar 3d object detection. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1637–1644. IEEE, 2023. 2, 7

  13. [13]

    Gaussian Error Linear Units (GELUs)

    D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. 5

  14. [14]

    Query-based temporal fusion with explicit motion for 3d object detection.Advances in Neural Information Processing Systems, 36:75782–75797, 2023

    Jinghua Hou, Zhe Liu, Zhikang Zou, Xiaoqing Ye, Xiang Bai, et al. Query-based temporal fusion with explicit motion for 3d object detection.Advances in Neural Information Processing Systems, 36:75782–75797, 2023. 3

  15. [15]

    Vadet: Multi-frame lidar 3d object detection using variable aggregation

    Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, and Krzysztof Czarnecki. Vadet: Multi-frame lidar 3d object detection using variable aggregation. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 711–720. IEEE, 2025. 7

  16. [16]

    Ptt: Point-trajectory transformer for efficient temporal 3d object detection

    Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, and Yi-Hsuan Tsai. Ptt: Point-trajectory transformer for efficient temporal 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14938–14947, 2024. 7

  17. [17]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023. 2, 4

  18. [18]

    A hierarchical 3d gaussian representation for real-time rendering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024

    Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time rendering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024

  19. [19]

    3d gaussian splatting as markov chain monte carlo.Advances in Neural Information Processing Systems, 37:80965–80986, 2024

    Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splatting as markov chain monte carlo.Advances in Neural Information Processing Systems, 37:80965–80986, 2024. 2

  20. [20]

    Joint 3d proposal generation and object detection from view aggregation

    Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L Waslander. Joint 3d proposal generation and object detection from view aggregation. In2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 1–8. IEEE, 2018. 2

  21. [21]

    Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom

    Alex H. Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InCVPR, 2019. 2, 6

  22. [22]

    Lidaraugment: Searching for scalable 3d lidar data augmentations.arXiv preprint arXiv:2210.13488, 2022

    Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, and Mingxing Tan. Lidaraugment: Searching for scalable 3d lidar data augmentations.arXiv preprint arXiv:2210.13488, 2022. 6

  23. [23]

    3d fully convolutional network for vehicle detection in point cloud

    Bo Li. 3d fully convolutional network for vehicle detection in point cloud. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1513–1518. IEEE, 2017. 2

  24. [24]

    Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion

    Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, et al. Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17524–17534, 2023. 7

  25. [25]

    Hdmapnet: An online hd map construction and evaluation framework

    Yicheng Li, Qi Li, Tian Guo, Li Wang, Yu Wang, Qinhong Zhang, Yi Ding, Yingfeng Zhang, and Liangjun Zheng. Hdmapnet: An online hd map construction and evaluation framework. InIEEE International Conference on Robotics and Automation (ICRA), pages 4628–4634, 2022. 2, 3

  26. [26]

    Modar: Using motion forecasting for 3d object detection in point cloud sequences

    Yingwei Li, Charles R Qi, Yin Zhou, Chenxi Liu, and Dragomir Anguelov. Modar: Using motion forecasting for 3d object detection in point cloud sequences. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9329–9339, 2023. 3, 7, 1

  27. [27]

    Stellar: Scaling 3d perception large models for autonomous driving,

    Yingwei Li, Xin Huang, Yang Liu, Yang Fu, Alex Zihao Zhu, Chen Song, Junwen Yao, Anant Subramanian, Hao Xiang, Weijing Shi, Yuliang Zou, Tom Hoddes, Zhaoqi Leng, Govind Thattai, Dragomir Anguelov, and Mingxing Tan. Stellar: Scaling 3d perception large models for autonomous driving,

  28. [28]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 2

  29. [29]

    Pnpnet: End-to-end perception and prediction with tracking in the loop

    Ming Liang, Bin Yang, Wenyuan Zeng, Yun Chen, Rui Hu, Sergio Casas, and Raquel Urtasun. Pnpnet: End-to-end perception and prediction with tracking in the loop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11553–11562, 2020. 3

  30. [30]

    Maptr: Structured modeling and learning for online vectorized hd map construction

    Yihan Liao, Yicheng Li, Yinghong Chen, Qinhong Zhang, and Li Zhang. Maptr: Structured modeling and learning for online vectorized hd map construction. InInternational Conference on Learning Representations (ICLR), 2023. arXiv preprint arXiv:2303.12574. 2, 3

  31. [31]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll ´ar. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 5

  32. [32]

    Vectormapnet: End-to-end vectorized hd map learning

    Jing Liu, Zheng Ding, Tianqi Zhang, Jiaqi Chen, and Jifeng Zhang. Vectormapnet: End-to-end vectorized hd map learning. InInternational Conference on Machine Learning (ICML), 2023. arXiv preprint arXiv:2303.08785. 2, 3

  33. [33]

    Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation.arXiv preprint arXiv:2205.13542, 2022

    Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, and Song Han. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation.arXiv preprint arXiv:2205.13542, 2022. 4, 7

  34. [34]

    Lion: Linear group rnn for 3d object detection in point clouds.Advances in Neural Information Processing Systems, 37:13601–13626,

    Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, and Xiang Bai. Lion: Linear group rnn for 3d object detection in point clouds.Advances in Neural Information Processing Systems, 37:13601–13626,

  35. [35]

    Seed: A simple and effective 3d detr in point clouds

    Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, and Xiang Bai. Seed: A simple and effective 3d detr in point clouds. InEuropean Conference on Computer Vision, pages 110–126. Springer, 2024. 7

  36. [36]

    Rethinking network design and local geometry in point cloud: A simple resid- ual mlp framework

    Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. Rethinking network design and local geometry in point cloud: A simple residual mlp framework.arXiv preprint arXiv:2202.07123, 2022. 4

  37. [37]

    Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021. 2

  38. [38]

    PhD thesis, Massachusetts Institute of Technology, 2022

    Teddy Ort.Autonomous navigation without HD prior maps. PhD thesis, Massachusetts Institute of Technology, 2022. 2

  39. [39]

    Surfels: Surface elements as rendering primitives

    Hanspeter Pfister, Matthias Zwicker, Jeroen Van Baar, and Markus Gross. Surfels: Surface elements as rendering primitives. InProceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 335–342, 2000. 2, 3

  40. [40]

    Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d

    Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020. 2, 4, 6

  41. [41]

    Pillarnet: Real-time and high-performance pillar-based 3d object detection

    Guangsheng Shi, Ruifeng Li, and Chao Ma. Pillarnet: Real-time and high-performance pillar-based 3d object detection. InECCV, 2022. 6

  42. [42]

    From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network

    Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. In TPAMI, 2019. 6

  43. [43]

    Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection

    Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection. InIJCV, 2023. 6

  44. [44]

    Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds

    Martin Simony, Stefan Milzy, Karl Amendey, and Horst-Michael Gross. Complex-yolo: An euler-region-proposal for real-time 3d object detection on point clouds. InProceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018. 2

  45. [45]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception...

  46. [46]

    Rsn: Range sparse net for efficient, accurate lidar 3d object detection

    Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, and Dragomir Anguelov. Rsn: Range sparse net for efficient, accurate lidar 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5725–5734, 2021. 5

  47. [47]

    Swformer: Sparse window transformer for 3d object detection in point clouds

    Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, and Dragomir Anguelov. Swformer: Sparse window transformer for 3d object detection in point clouds. InECCV, 2022. 2, 3, 4, 5, 6, 7

  48. [48]

    Block-nerf: Scalable large scene neural view synthesis

    Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P Srinivasan, Jonathan T Barron, and Henrik Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8248–8258, 2022. 2

  49. [49]

    Fully convolutional one-stage 3d object detection on liDAR range images

    Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, and Chunhua Shen. Fully convolutional one-stage 3d object detection on liDAR range images. InNIPS, 2022. 2

  50. [50]

    Dsvt: Dynamic sparse voxel transformer with rotated sets

    Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, and Liwei Wang. Dsvt: Dynamic sparse voxel transformer with rotated sets. InCVPR, 2023. 6

  51. [51]

    High definition map mapping and update: A general overview and future directions.arXiv preprint arXiv:2409.09726, 2024

    Benny Wijaya, Kun Jiang, Mengmeng Yang, Tuopu Wen, Yunlong Wang, Xuewei Tang, Zheng Fu, Taohua Zhou, and Diange Yang. High definition map mapping and update: A general overview and future directions.arXiv preprint arXiv:2409.09726, 2024. 2

  52. [52]

    3dgut: Enabling distorted cameras and secondary rays in gaussian splatting

    Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, and Zan Gojcic. 3dgut: Enabling distorted cameras and secondary rays in gaussian splatting. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 26036–26046, 2025. 4

  53. [53]

    Mim: High-definition maps incorporated multi-view 3d object detection.IEEE Transactions on Intelligent Transportation Systems, 26(3):3989–4001, 2025

    Jinsheng Xiao, Shurui Wang, Jian Zhou, Ziyue Tian, Hongping Zhang, and Yuan-Fang Wang. Mim: High-definition maps incorporated multi-view 3d object detection.IEEE Transactions on Intelligent Transportation Systems, 26(3):3989–4001, 2025. 2

  54. [54]

    Neural map prior for autonomous driving

    Xuan Xiong, Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Zhao Hang. Neural map prior for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), 2023. 2, 3

  55. [55]

    Second: Sparsely embedded convolutional detection

    Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embedded convolutional detection. InSensors, 2018. 2, 6

  56. [56]

    Hdnet: Exploiting hd maps for 3d object detection

    Bin Yang, Ming Liang, and Raquel Urtasun. Hdnet: Exploiting hd maps for 3d object detection. InProceedings of The 2nd Conference on Robot Learning, pages 146–155. PMLR, 2018. 2

  57. [57]

    Surfelgan: Synthesizing realistic sensor data for autonomous driving.arXiv, 2020

    Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, and Henrik Kretzschmar. Surfelgan: Synthesizing realistic sensor data for autonomous driving.arXiv, 2020. 2, 3

  58. [58]

    3d-man: 3d multi-frame attention network for object detection

    Zetong Yang, Yin Zhou, Zhifeng Chen, and Jiquan Ngiam. 3d-man: 3d multi-frame attention network for object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1863–1872,

  59. [59]

    Center-based 3d object detection and tracking

    Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center-based 3d object detection and tracking. InCVPR,

  60. [60]

    Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

    Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, and Cho-Jui Hsieh. Large batch optimization for deep learning: Training bert in 76 minutes.arXiv preprint arXiv:1904.00962, 2019. 6

  61. [61]

    Hindsight is 20/20: Leveraging past traversals to aid 3d perception.arXiv preprint arXiv:2203.11405, 2022

    Yurong You, Katie Z Luo, Xiangyu Chen, Junan Chen, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, and Kilian Q Weinberger. Hindsight is 20/20: Leveraging past traversals to aid 3d perception.arXiv preprint arXiv:2203.11405, 2022. 2

  62. [62]

    Motiontrack: End-to-end transformer-based multi-object tracking with lidar-camera fusion

    Ce Zhang, Chengjie Zhang, Yiluan Guo, Lingji Chen, and Michael Happold. Motiontrack: End-to-end transformer-based multi-object tracking with lidar-camera fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 151–160,

  63. [63]

    HEDNet: A hierarchical encoder-decoder network for 3d object detection in point clouds

    Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, and Xiaolin Hu. HEDNet: A hierarchical encoder-decoder network for 3d object detection in point clouds. InNeurIPS,

  64. [64]

    Safdnet: A simple and effective network for fully sparse 3d object detection

    Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, and Xiaolin Hu. Safdnet: A simple and effective network for fully sparse 3d object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14477–14486, 2024. 2, 6, 7

  65. [65]

    Iou loss for 2d/3d object detection

    Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, and Ruigang Yang. Iou loss for 2d/3d object detection. In2019 international conference on 3D vision (3DV), pages 85–94. IEEE, 2019. 6

  66. [66]

    Objects as Points

    Xingyi Zhou, Dequan Wang, and Philipp Kr¨ahenb¨uhl. Objects as points.arXiv preprint arXiv:1904.07850, 2019. 5

  67. [67]

    On the continuity of rotation representations in neural networks

    Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5745–5753,

  68. [68]

    End-to-end multi-view fusion for 3d object detection in lidar point clouds

    Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, and Vijay Vasudevan. End-to-end multi-view fusion for 3d object detection in lidar point clouds. InConference on Robot Learning, pages 923–932. PMLR, 2020. 4

  69. [69]

    Centerformer: Center-based transformer for 3d object detection

    Zixiang Zhou, Xiangchen Zhao, Yu Wang, Panqu Wang, and Hassan Foroosh. Centerformer: Center-based transformer for 3d object detection. InECCV, 2022. 2, 6, 7