pith. sign in

arxiv: 1812.05784 · v2 · pith:KPDAQPCQnew · submitted 2018-12-14 · 💻 cs.LG · cs.CV· stat.ML

PointPillars: Fast Encoders for Object Detection from Point Clouds

classification 💻 cs.LG cs.CVstat.ML
keywords detectionencoderspointcloudspointpillarsobjectwhileaccuracy
0
0 comments X
read the original abstract

Object detection in point clouds is an important aspect of many robotics applications such as autonomous driving. In this paper we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline. Recent literature suggests two types of encoders; fixed encoders tend to be fast but sacrifice accuracy, while encoders that are learned from data are more accurate, but slower. In this work we propose PointPillars, a novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). While the encoded features can be used with any standard 2D convolutional detection architecture, we further propose a lean downstream network. Extensive experimentation shows that PointPillars outperforms previous encoders with respect to both speed and accuracy by a large margin. Despite only using lidar, our full detection pipeline significantly outperforms the state of the art, even among fusion methods, with respect to both the 3D and bird's eye view KITTI benchmarks. This detection performance is achieved while running at 62 Hz: a 2 - 4 fold runtime improvement. A faster version of our method matches the state of the art at 105 Hz. These benchmarks suggest that PointPillars is an appropriate encoding for object detection in point clouds.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Analyzing the Cross-Sensor Portability of Neural Network Architectures for LiDAR-based Semantic Labeling

    cs.CV 2019-07 unverdicted novelty 4.0

    A new CNN architecture for LiDAR semantic labeling achieves higher cross-sensor portability with a reported 10 percentage point IoU gain over a reference method.

  2. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds

    cs.CV 2019-06 unverdicted novelty 4.0

    Voxel-FPN proposes an encoder-decoder architecture for multi-scale voxel feature aggregation in one-stage 3D object detection from point clouds, reporting competitive speed and accuracy on KITTI-3D.