FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices
Pith reviewed 2026-05-19 10:31 UTC · model grok-4.3
The pith
FALO sorts sparse LiDAR voxels into a 1D sequence and processes them with ConvDotMix blocks to match detection accuracy at much higher speed on mobile hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FALO arranges sparse 3D voxels into a 1D sequence based on their coordinates and proximity after voxelization. The sequence is processed by ConvDotMix blocks consisting of large-kernel convolutions, Hadamard products, and linear layers. Implicit grouping is introduced to balance tensor dimensions and account for the growing receptive field. These operations provide sufficient mixing capability in both spatial and embedding dimensions and introduce higher-order nonlinear interaction among spatial features. The resulting model achieves competitive performance on nuScenes and Waymo while running 1.6 to 9.8 times faster than the latest state-of-the-art detectors on mobile GPU and mobile NPU.
What carries the argument
ConvDotMix blocks that combine large-kernel convolutions, Hadamard products, and linear layers with implicit grouping to mix spatial and embedding features on a 1D voxel sequence.
If this is right
- FALO can be deployed directly on compact embedded devices with mobile GPU or NPU hardware.
- The approach avoids the irregular memory patterns of sparse convolutions and the high costs of transformers.
- Detection accuracy remains competitive on established benchmarks such as nuScenes and Waymo.
- Inference speed improves by a factor of 1.6 to 9.8 relative to recent state-of-the-art methods.
Where Pith is reading between the lines
- The 1D proximity ordering plus ConvDotMix mixing may offer a lightweight substitute for explicit 3D neighborhood operations in other sparse-data perception tasks.
- Hardware-specific tuning of the implicit grouping step could further improve efficiency on particular NPU architectures.
- If the higher-order interactions from the Hadamard products are the key accuracy driver, similar element-wise operations could be tested in related vision pipelines for speed gains.
Load-bearing premise
Arranging sparse voxels into a 1D sequence by coordinate proximity together with the ConvDotMix operations supplies enough spatial and embedding mixing to match the accuracy of sparse-convolution or transformer baselines without explicit 3D neighborhood modeling.
What would settle it
Run FALO on the nuScenes and Waymo validation sets, compare its detection metrics directly to the latest sparse-convolution and transformer detectors, and measure wall-clock inference latency on a mobile GPU or NPU to check whether the claimed accuracy and 1.6-9.8x speedup both hold.
Figures
read the original abstract
Existing LiDAR 3D object detection methods predominantely rely on sparse convolutions and/or transformers, which can be challenging to run on resource-constrained edge devices, due to irregular memory access patterns and high computational costs. In this paper, we propose FALO, a hardware-friendly approach to LiDAR 3D detection, which offers both state-of-the-art (SOTA) detection accuracy and fast inference speed. More specifically, given the 3D point cloud and after voxelization, FALO first arranges sparse 3D voxels into a 1D sequence based on their coordinates and proximity. The sequence is then processed by our proposed ConvDotMix blocks, consisting of large-kernel convolutions, Hadamard products, and linear layers. ConvDotMix provides sufficient mixing capability in both spatial and embedding dimensions, and introduces higher-order nonlinear interaction among spatial features. Furthermore, when going through the ConvDotMix layers, we introduce implicit grouping, which balances the tensor dimensions for more efficient inference and takes into account the growing receptive field. All these operations are friendly to run on resource-constrained platforms and proposed FALO can readily deploy on compact, embedded devices. Our extensive evaluation on LiDAR 3D detection benchmarks such as nuScenes and Waymo shows that FALO achieves competitive performance. Meanwhile, FALO is 1.6~9.8x faster than the latest SOTA on mobile Graphics Processing Unit (GPU) and mobile Neural Processing Unit (NPU).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes FALO for LiDAR 3D object detection on resource-constrained devices. After voxelization of the input point cloud, sparse voxels are sorted into a 1D sequence according to coordinate proximity. This sequence is processed by ConvDotMix blocks that combine large-kernel 1D convolutions, Hadamard products, and linear layers, with implicit grouping applied across layers to balance dimensions and grow receptive fields. The method is claimed to deliver competitive detection accuracy on nuScenes and Waymo while providing 1.6–9.8× speedups over recent SOTA models on mobile GPU and NPU hardware.
Significance. If the accuracy claims are substantiated, the work would be significant for enabling real-time 3D detection on edge platforms. Converting irregular 3D sparse data into a hardware-friendly 1D sequence with custom mixing operations addresses a practical deployment gap that current sparse-convolution and transformer approaches have not fully closed.
major comments (2)
- [Experiments] Experimental section: The manuscript asserts competitive benchmark results on nuScenes and Waymo yet reports no error bars, no ablation studies isolating the contributions of proximity sorting, large-kernel convolution, Hadamard product, or implicit grouping, and no detailed training protocol or hyper-parameter settings. These omissions prevent verification that the observed accuracy is robust and attributable to the proposed components rather than implementation details.
- [Method] Method section (ConvDotMix and implicit grouping): The central accuracy claim rests on the premise that coordinate-proximity 1D sorting plus large-kernel 1D operations and implicit grouping supply sufficient 3D spatial mixing. A linear sort necessarily severs some true 3D adjacencies while creating spurious 1D neighbors; the paper provides no quantitative analysis, receptive-field visualization, or cross-object contamination study demonstrating that the resulting mixing matches the locality modeling of sparse 3D convolutions or transformers.
minor comments (2)
- [Abstract] Abstract: Typo 'predominantely' should read 'predominantly'.
- [Abstract] Abstract: The speedup range '1.6~9.8x' should specify the exact baseline models and hardware configurations for each end of the range.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects for improving experimental rigor and methodological clarity. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experimental section: The manuscript asserts competitive benchmark results on nuScenes and Waymo yet reports no error bars, no ablation studies isolating the contributions of proximity sorting, large-kernel convolution, Hadamard product, or implicit grouping, and no detailed training protocol or hyper-parameter settings. These omissions prevent verification that the observed accuracy is robust and attributable to the proposed components rather than implementation details.
Authors: We agree that these additions would improve reproducibility and help attribute performance gains to specific components. In the revised manuscript, we will report error bars from multiple independent training runs with different random seeds. We will also add ablation studies that isolate the individual contributions of proximity-based voxel sorting, large-kernel 1D convolutions, Hadamard products, and implicit grouping. Finally, we will include a dedicated subsection detailing the full training protocol, including all hyper-parameters, optimizer settings, learning rate schedules, data augmentations, and hardware used for training. revision: yes
-
Referee: [Method] Method section (ConvDotMix and implicit grouping): The central accuracy claim rests on the premise that coordinate-proximity 1D sorting plus large-kernel 1D operations and implicit grouping supply sufficient 3D spatial mixing. A linear sort necessarily severs some true 3D adjacencies while creating spurious 1D neighbors; the paper provides no quantitative analysis, receptive-field visualization, or cross-object contamination study demonstrating that the resulting mixing matches the locality modeling of sparse 3D convolutions or transformers.
Authors: We acknowledge that a 1D proximity sort can disrupt some true 3D adjacencies and introduce spurious neighbors. However, the large-kernel 1D convolutions in ConvDotMix enable broad information propagation along the sequence, while implicit grouping progressively expands the effective receptive field across layers to recover 3D context. In the revision, we will add receptive-field visualizations demonstrating feature mixing from 3D-proximate voxels and a quantitative analysis of effective neighborhood sizes compared to sparse convolutions. We will also include a brief discussion of cross-object contamination, supported by the observation that coordinate-proximity sorting largely preserves object boundaries; a more extensive contamination study can be considered if space permits. revision: partial
Circularity Check
No circularity: architecture and benchmarks are independent of fitted inputs
full rationale
The paper proposes a new voxel-to-1D-sequence arrangement followed by ConvDotMix blocks (large-kernel conv, Hadamard product, linear layers, implicit grouping) and reports accuracy/speed on public nuScenes and Waymo benchmarks. No equations, self-citations, or fitted parameters are shown that define the claimed mixing capability or performance by construction from the same inputs. The derivation chain remains self-contained and externally falsifiable.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Transfusion: Ro- bust lidar-camera fusion for 3d object detection with trans- formers
Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, and Chiew-Lan Tai. Transfusion: Ro- bust lidar-camera fusion for 3d object detection with trans- formers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1090–1099,
-
[2]
nuscenes: A multi- modal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 6
work page 2020
-
[3]
Sasa: Semantics-augmented set abstraction for point-based 3d ob- ject detection
Chen Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. Sasa: Semantics-augmented set abstraction for point-based 3d ob- ject detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 221–229, 2022. 3
work page 2022
-
[4]
Yilun Chen, Shu Liu, Xiaoyong Shen, and Jiaya Jia. Fast point r-cnn. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9775–9784, 2019. 3
work page 2019
-
[5]
Largekernel3d: Scaling up kernels in 3d sparse cnns
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. Largekernel3d: Scaling up kernels in 3d sparse cnns. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 13488–13498,
-
[6]
V oxelnext: Fully sparse voxelnet for 3d object de- tection and tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelnext: Fully sparse voxelnet for 3d object de- tection and tracking. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 21674–21683, 2023. 3, 6, 7
work page 2023
-
[7]
Focal- former3d: focusing on hard instance for 3d object detection
Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, An- ima Anandkumar, Jiaya Jia, and Jose M Alvarez. Focal- former3d: focusing on hard instance for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8394–8405, 2023. 7
work page 2023
-
[8]
Back-tracing representative points for voting- based 3d object detection in point clouds
Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, and Dong Xu. Back-tracing representative points for voting- based 3d object detection in point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8963–8972, 2021. 3
work page 2021
-
[9]
V oxel r-cnn: Towards high performance voxel-based 3d object detection
Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. V oxel r-cnn: Towards high performance voxel-based 3d object detection. In Pro- ceedings of the AAAI conference on artificial intelligence , pages 1201–1209, 2021. 2, 3
work page 2021
-
[10]
Vista: Boosting 3d object detection via dual cross-view spatial at- tention
Shengheng Deng, Zhihao Liang, Lin Sun, and Kui Jia. Vista: Boosting 3d object detection via dual cross-view spatial at- tention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8448–8457,
-
[11]
Mssvt: Mixed-scale sparse voxel transformer for 3d object detection on point clouds
Shaocong Dong, Lihe Ding, Haiyang Wang, Tingfa Xu, Xinli Xu, Jie Wang, Ziyang Bian, Ying Wang, and Jianan Li. Mssvt: Mixed-scale sparse voxel transformer for 3d object detection on point clouds. Advances in Neural Information Processing Systems, 35:11615–11628, 2022. 2, 3
work page 2022
-
[12]
Embracing single stride 3d object detector with sparse trans- former
Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Embracing single stride 3d object detector with sparse trans- former. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8458–8468,
-
[13]
Fsd v2: Improving fully sparse 3d object detection with vir- tual voxels
Lue Fan, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Fsd v2: Improving fully sparse 3d object detection with vir- tual voxels. arXiv preprint arXiv:2308.03755, 2023. 6
-
[14]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
M3detr: Multi- representation, multi-scale, mutual-relation 3d object detec- tion with transformers
Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zux- uan Wu, Larry Davis, and Dinesh Manocha. M3detr: Multi- representation, multi-scale, mutual-relation 3d object detec- tion with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 772–782, 2022. 2, 3
work page 2022
-
[16]
Structure aware single-stage 3d object detec- tion from point cloud
Chenhang He, Hui Zeng, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. Structure aware single-stage 3d object detec- tion from point cloud. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 11873–11882, 2020. 3
work page 2020
-
[17]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 7
work page 2016
-
[18]
Conv2former: A simple transformer-style convnet for visual recognition
Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, and Jiashi Feng. Conv2former: A simple transformer-style convnet for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 2, 5
work page 2024
-
[19]
Pointpillars: Fast encoders for object detection from point clouds
Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 2, 3, 6, 7
work page 2019
-
[20]
Padre: A unifying polynomial attention drop-in replace- ment for efficient vision transformer
Pierre-David Letourneau, Manish Kumar Singh, Hsin- Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, and Fatih Porikli. Padre: A unifying polynomial attention drop-in replace- ment for efficient vision transformer. arXiv preprint arXiv:2407.11306, 2024. 2, 5
-
[21]
Pillarnext: Re- thinking network designs for 3d object detection in lidar point clouds
Jinyu Li, Chenxu Luo, and Xiaodong Yang. Pillarnext: Re- thinking network designs for 3d object detection in lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17567– 17576, 2023. 2, 3, 6, 7
work page 2023
-
[22]
Tanet: Robust 3d object detection from point clouds with triple attention
Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, and Xiang Bai. Tanet: Robust 3d object detection from point clouds with triple attention. In Proceedings of the AAAI con- ference on artificial intelligence, pages 11677–11684, 2020. 2, 3
work page 2020
-
[23]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 11976–11986,
-
[24]
Flatformer: Flattened window attention for efficient point cloud transformer
Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, and Song Han. Flatformer: Flattened window attention for efficient point cloud transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1200–1211, 2023. 2, 7
work page 2023
-
[25]
Lion:Lineargrouprnnfor3dobjectdetectioninpointclouds
Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, and Xiang Bai. Lion: Linear group rnn for 3d object detection in point clouds. arXiv preprint arXiv:2407.18232, 2024. 2, 3, 4
-
[26]
Link: Linear kernel for lidar-based 3d perception
Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, and Limin Wang. Link: Linear kernel for lidar-based 3d perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1105–1115,
-
[27]
Pillarnest: Embracing backbone scaling and pretraining for pillar-based 3d object detection
Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, and Osamu Yoshie. Pillarnest: Embracing backbone scaling and pretraining for pillar-based 3d object detection. IEEE Trans- actions on Intelligent Vehicles, 2024. 3, 6, 7
work page 2024
-
[28]
3d object detection with pointformer
Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 3d object detection with pointformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7463–7472, 2021. 3
work page 2021
-
[29]
Pointnet: Deep learning on point sets for 3d classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,
-
[30]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017. 3
work page 2017
-
[31]
Frustum pointnets for 3d object detection from rgb- d data
Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. Frustum pointnets for 3d object detection from rgb- d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 918–927, 2018. 3
work page 2018
-
[32]
Deep hough voting for 3d object detection in point clouds
Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep hough voting for 3d object detection in point clouds. In proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 9277–9286, 2019. 3
work page 2019
-
[33]
Pillarnet: Real- time and high-performance pillar-based 3d object detection
Guangsheng Shi, Ruifeng Li, and Chao Ma. Pillarnet: Real- time and high-performance pillar-based 3d object detection. In European Conference on Computer Vision, pages 35–52. Springer, 2022. 1, 2, 3, 6, 7
work page 2022
-
[34]
Pv-rcnn: Point- voxel feature set abstraction for 3d object detection
Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn: Point- voxel feature set abstraction for 3d object detection. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10529–10538, 2020. 3
work page 2020
-
[35]
Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation net- work. IEEE transactions on pattern analysis and machine intelligence, 43(8):2647–2664, 2020. 2, 3
work page 2020
-
[36]
Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv- rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection. International Journal of Computer Vision, 131(2):531–551, 2023. 3
work page 2023
-
[37]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020. 6
work page 2020
-
[38]
Swformer: Sparse window transformer for 3d object detection in point clouds
Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, and Dragomir Anguelov. Swformer: Sparse window transformer for 3d object detection in point clouds. In European Conference on Computer Vision , pages 426–
-
[39]
Ca- group3d: Class-aware grouping for 3d object detection on point clouds
Haiyang Wang, Lihe Ding, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, and Liwei Wang. Ca- group3d: Class-aware grouping for 3d object detection on point clouds. Advances in Neural Information Processing Systems, 35:29975–29988, 2022. 2, 3
work page 2022
-
[40]
Dsvt: Dy- namic sparse voxel transformer with rotated sets
Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, and Liwei Wang. Dsvt: Dy- namic sparse voxel transformer with rotated sets. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13520–13529, 2023. 1, 2, 3, 4, 6, 7, 8
work page 2023
-
[41]
Uni3detr: Unified 3d detection transformer
Zhenyu Wang, Ya-Li Li, Xi Chen, Hengshuang Zhao, and Shengjin Wang. Uni3detr: Unified 3d detection transformer. Advances in Neural Information Processing Systems , 36,
-
[42]
Second: Sparsely embed- ded convolutional detection
Yan Yan, Yuxing Mao, and Bo Li. Second: Sparsely embed- ded convolutional detection. Sensors, 18(10):3337, 2018. 2, 3, 7
work page 2018
-
[43]
Pvt-ssd: Single-stage 3d object detector with point-voxel transformer
Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, and Wanli Ouyang. Pvt-ssd: Single-stage 3d object detector with point-voxel transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13476–13487, 2023. 3
work page 2023
-
[44]
Dbq-ssd: Dynamic ball query for efficient 3d object detection
Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zem- ing Li, Xiaoping Li, Hongbin Sun, Jian Sun, and Nanning Zheng. Dbq-ssd: Dynamic ball query for efficient 3d object detection. In International Conference on Learning Repre- sentations, 2023. 3
work page 2023
-
[45]
Std: Sparse-to-dense 3d object detector for point cloud
Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1951–1960, 2019
work page 1951
-
[46]
3dssd: Point-based 3d single stage object detector
Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11040–11048, 2020. 3
work page 2020
-
[47]
Center- based 3d object detection and tracking
Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. Center- based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11784–11793, 2021. 2, 3, 6, 7
work page 2021
-
[48]
Metaformer is actually what you need for vision
Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, and Shuicheng Yan. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022. 5
work page 2022
-
[49]
Safdnet: A simple and effective net- work for fully sparse 3d object detection
Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, and Xiaolin Hu. Safdnet: A simple and effective net- work for fully sparse 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14477–14486, 2024. 2, 3, 6, 7
work page 2024
-
[50]
Voxel mamba: Group-free state space models for point cloud based 3d object detection
Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaox- iang Zhang, and Lei Zhang. V oxel mamba: Group-free state space models for point cloud based 3d object detection. arXiv preprint arXiv:2406.10700, 2024. 2, 3, 4
-
[51]
Hednet: A hierarchical encoder-decoder net- work for 3d object detection in point clouds
Gang Zhang, Chen Junnan, Guohuan Gao, Jianmin Li, and Xiaolin Hu. Hednet: A hierarchical encoder-decoder net- work for 3d object detection in point clouds. Advances in Neural Information Processing Systems, 36, 2024. 3, 7
work page 2024
-
[52]
Yifan Zhang, Qingyong Hu, Guoquan Xu, Yanxin Ma, Jian- wei Wan, and Yulan Guo. Not all points are equal: Learn- ing highly efficient point-based detectors for 3d lidar point clouds. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 18953–18962,
-
[53]
Octr: Octree-based transformer for 3d object detection
Chao Zhou, Yanan Zhang, Jiaxin Chen, and Di Huang. Octr: Octree-based transformer for 3d object detection. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5166–5175, 2023. 7
work page 2023
-
[54]
Fastpillars: A deployment-friendly pillar-based 3d detector
Sifan Zhou, Zhi Tian, Xiangxiang Chu, Xinyu Zhang, Bo Zhang, Xiaobo Lu, Chengjian Feng, Zequn Jie, Patrick Yin Chiang, and Lin Ma. Fastpillars: A deployment-friendly pillar-based 3d detector. arXiv preprint arXiv:2302.02367 , 9, 2023. 2, 3, 6, 7
-
[55]
V oxelnet: End-to-end learning for point cloud based 3d object detection
Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 4490–4499, 2018. 2
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.