Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR

Lars Kunze; Paul Newman; Tarlan Suleymanov

arxiv: 1907.05375 · v1 · pith:YSFE5PM5new · submitted 2019-07-11 · 💻 cs.RO · cs.CV· cs.LG

Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR

Tarlan Suleymanov , Lars Kunze , Paul Newman This is my paper

Pith reviewed 2026-05-24 23:04 UTC · model grok-4.3

classification 💻 cs.RO cs.CVcs.LG

keywords curb detectionLIDARocclusiondeep learningbird's-eye viewautonomous drivingreal-timeroad boundaries

0 comments

The pith

Deep networks infer visible and occluded curbs from sparse LIDAR bird's-eye views in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to detect road curbs for autonomous vehicles even when traffic occludes them or lighting changes. Sparse 3D LIDAR points are projected into 2D overhead images that trained networks use to locate both seen and hidden boundary segments. Detected segments are then filtered and tracked across frames to deliver continuous 360-degree metric output. This supplies the precise road-edge data motion planners need without relying on clear weather or empty streets.

Core claim

Projecting 3D LIDAR pointcloud data into 2D bird's-eye view images allows trained deep networks to infer both visible and occluded road boundaries; a post-processing step filters the curb segments and tracks them over time to produce accurate real-time 360-degree detections under occlusion and varying conditions.

What carries the argument

Projection of sparse LIDAR point clouds into 2D bird's-eye view images that deep networks process to infer road boundaries.

If this is right

Motion planners receive continuous metric curb data around the full vehicle.
Detection remains possible when other vehicles block direct line of sight.
Performance holds across lighting and weather changes that affect camera-based methods.
Real-time operation meets the speed requirements of urban driving planners.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection-plus-network pipeline could be tested on other sparse range sensors such as solid-state LIDAR.
Combining the curb output with camera data might reduce false positives at road edges.
The tracking filter could be extended to predict curb locations a short distance ahead in time.

Load-bearing premise

Trained deep networks can accurately infer occluded road boundaries from the projected 2D bird's-eye view images of sparse LIDAR data.

What would settle it

Ground-truth curb locations recorded on a route with frequent traffic occlusions where the network outputs deviate by more than a fixed distance threshold from the measured edges.

Figures

Figures reproduced from arXiv: 1907.05375 by Lars Kunze, Paul Newman, Tarlan Suleymanov.

**Figure 1.** Figure 1: Our 360◦ LIDAR-based curb detection approach. First, LIDAR data from the ego-vehicle (white) is transformed in bird’seye view images which are then processed by trained deep networks to detect visible (white) and occluded (yellow) curbs. Finally, postprocessing steps filters out outliers and tracks curbs over time (blue). The result is a robust curb detection around the vehicle over a total distance of 9… view at source ↗

**Figure 2.** Figure 2: Our 360◦ LIDAR-based curb detection approach. A pre-processing step integrates several subsequent laser scans into a coherent coordinate frame and projects them into a bird’s-eye view image with height information (left). This image is then processed by two deep segmentation networks to detect both visible and occluded road boundaries (middle). Note that the network responsible for occluded boundaries addi… view at source ↗

**Figure 4.** Figure 4: Top left: integrated LIDAR pointcloud. Bottom left: Filtered [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Partitioning of training data. From left to right: bird’s-eye view image, detected obstacles as well as visible and occluded curbs, [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of labelled training data. Visible curbs are marked in white, while occluded curbs are marked in yellow. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Parameterisation of curb lines in discrete-continuous form. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Post-processing. From left to right: Output of curb detection networks (visible and occluded). Filtering step in which noise [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Post-processing steps: A first step consolidates detection [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 10.** Figure 10: First row: Sample outputs from the networks of detected and inferred road boundaries. Second row: Sample outputs after [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

read the original abstract

Road boundaries, or curbs, provide autonomous vehicles with essential information when interpreting road scenes and generating behaviour plans. Although curbs convey important information, they are difficult to detect in complex urban environments (in particular in comparison to other elements of the road such as traffic signs and road markings). These difficulties arise from occlusions by other traffic participants as well as changing lighting and/or weather conditions. Moreover, road boundaries have various shapes, colours and structures while motion planning algorithms require accurate and precise metric information in real-time to generate their plans. In this paper, we present a real-time LIDAR-based approach for accurate curb detection around the vehicle (360 degree). Our approach deals with both occlusions from traffic and changing environmental conditions. To this end, we project 3D LIDAR pointcloud data into 2D bird's-eye view images (akin to Inverse Perspective Mapping). These images are then processed by trained deep networks to infer both visible and occluded road boundaries. Finally, a post-processing step filters detected curb segments and tracks them over time. Experimental results demonstrate the effectiveness of the proposed approach on real-world driving data. Hence, we believe that our LIDAR-based approach provides an efficient and effective way to detect visible and occluded curbs around the vehicles in challenging driving scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper outlines a LIDAR BEV projection plus deep network pipeline for 360-degree curb detection that includes occluded cases, but the abstract gives no metrics or evaluation details to back the occlusion claim.

read the letter

The core of this paper is a pipeline that takes sparse 3D LIDAR, projects it into 2D bird's-eye-view images, runs those through trained deep networks to find both visible and hidden curbs, then applies filtering and temporal tracking for real-time 360 coverage. It targets the practical needs of motion planning in urban scenes where traffic occludes boundaries and conditions change. The BEV step is a known technique, and using networks to infer missing segments is a direct way to get the metric output required. The authors note results on real driving data, which at least shows they tested the full stack end to end rather than just on simulation. That is the main positive: a complete, deployable-sounding system for a narrow but useful perception task. The soft spot is exactly what the stress-test flags. The abstract asserts effectiveness on occluded curbs without any numbers, baselines, or breakdown of visible versus hidden performance. There is no mention of how ground-truth labels were created for regions with no LIDAR returns, so it is unclear whether the network is recovering actual geometry or simply extending visible lines in plausible ways. Without that separation in the evaluation, the central claim about handling occlusions stays unverified. The paper is aimed at engineers working on autonomous vehicle perception who need curb geometry for planning. A reader already building similar LIDAR-to-image pipelines could pick up the post-processing and tracking steps. It is not a foundational result, but the topic is relevant enough that a serious editor should send it to referees so the experiments can be checked properly rather than desk-rejecting on the abstract alone.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a real-time LIDAR-based method for 360-degree curb detection that projects sparse 3D point clouds to 2D bird's-eye-view images, applies trained deep networks to infer both visible and occluded road boundaries, and uses post-processing plus temporal tracking to output metric curb segments. The central claim is that this handles occlusions from traffic and varying environmental conditions, with effectiveness demonstrated on real-world driving data.

Significance. If supported by rigorous evaluation, the work could contribute a practical perception module for autonomous vehicles in occluded urban scenes. The BEV projection plus deep inference approach is a standard and scalable direction for handling sparsity, and the emphasis on real-time 360-degree output aligns with motion-planning needs.

major comments (2)

[Abstract] Abstract: the assertion that 'experimental results demonstrate the effectiveness' of occluded curb inference is load-bearing for the central claim yet provides no metrics, baselines, error analysis, or separation of performance on occluded versus visible segments, preventing verification that the network generalizes rather than hallucinates hidden boundaries.
[Method] Method description: no information is given on the source or reliability of ground-truth labels for occluded curb segments used to train the deep networks; without this, the claim that the networks accurately recover metric geometry in regions with no direct LIDAR returns cannot be assessed.

minor comments (2)

[Abstract] The phrase 'akin to Inverse Perspective Mapping' is imprecise for a 3D-to-2D orthographic projection of LIDAR points; a brief clarification of the exact projection equations would improve reproducibility.
[Abstract] The abstract states that the approach 'provides accurate curb detection' but does not define accuracy criteria (e.g., lateral error tolerance or IoU threshold); adding this would strengthen the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'experimental results demonstrate the effectiveness' of occluded curb inference is load-bearing for the central claim yet provides no metrics, baselines, error analysis, or separation of performance on occluded versus visible segments, preventing verification that the network generalizes rather than hallucinates hidden boundaries.

Authors: The abstract is a concise summary; detailed metrics, baselines, and error analysis appear in the Experiments section on real-world driving data with occlusions. We agree the abstract could better support the claim by referencing key results. We will revise the abstract to include quantitative metrics and clarify evaluation on occluded scenes. Full separation of occluded vs. visible performance metrics is not explicitly tabulated in the current version, but we will add discussion of generalization in occluded regions to address hallucination concerns. revision: partial
Referee: [Method] Method description: no information is given on the source or reliability of ground-truth labels for occluded curb segments used to train the deep networks; without this, the claim that the networks accurately recover metric geometry in regions with no direct LIDAR returns cannot be assessed.

Authors: This is a valid observation; the current method section lacks explicit details on occluded label sourcing. Ground truth for occluded segments was obtained via high-definition map alignment combined with multi-frame manual verification for consistency. We will add a dedicated paragraph in the revised method section describing the labeling process, sources, and reliability checks to allow assessment of the inference claims. revision: yes

Circularity Check

0 steps flagged

No circularity; pipeline relies on external training data without self-referential reduction

full rationale

The paper presents a LIDAR-to-BEV projection followed by deep network inference of visible and occluded curbs, then post-processing and tracking. No equations, derivations, or fitted parameters are described that could reduce a claimed prediction to its own inputs by construction. The method depends on externally trained networks and real-world driving data without any self-citation load-bearing steps or ansatz smuggling. This is a standard applied ML pipeline whose central claim of occlusion handling rests on generalization from training examples rather than any definitional or self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no equations or implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5765 in / 830 out tokens · 17117 ms · 2026-05-24T23:04:18.066939+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

[1]

Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,

L. Kunze, T. Bruls, T. Suleymanov, and P. Newman, “Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,” in IEEE ITSC , Maui, Hawaii, USA, November 2018

work page 2018
[2]

3D road curb extraction from image sequence for automobile parking assist system,

V . Prinet, J. Wang, J. Lee, and D. Wettergreen, “3D road curb extraction from image sequence for automobile parking assist system,” IEEE ICIP , pp. 3847–3851, 2016

work page 2016
[3]

Multi- cue, model-based detection and mapping of road curb features using stereo vision,

M. Kellner, U. Hofmann, M. E. Bouzouraa, and N. Stephan, “Multi- cue, model-based detection and mapping of road curb features using stereo vision,” in IEEE ITSC , Sept 2015, pp. 1221–1228

work page 2015
[4]

Multi-cue road boundary detection using stereo vision,

L. Wang, T. Wu, Z. Xiao, L. Xiao, D. Zhao, and J. Han, “Multi-cue road boundary detection using stereo vision,” in IEEE ICVES , July 2016

work page 2016
[5]

Curb detection based on a multi-frame persistence map for urban driving scenarios,

F. Oniga, S. Nedevschi, and M. M. Meinecke, “Curb detection based on a multi-frame persistence map for urban driving scenarios,” inIEEE ITSC, Oct 2008, pp. 67–72

work page 2008
[6]

Road curb detection based on different elevation mapping techniques,

M. Kellner, M. E. Bouzouraa, and U. Hofmann, “Road curb detection based on different elevation mapping techniques,” in IEEE IV , June 2014, pp. 1217–1224

work page 2014
[7]

A temporal ﬁlter approach for detection and reconstruction of curbs and road surfaces based on conditional random ﬁelds,

J. Siegemund, U. Franke, and W. F ¨orstner, “A temporal ﬁlter approach for detection and reconstruction of curbs and road surfaces based on conditional random ﬁelds,” in IEEE IV , June 2011, pp. 637–642

work page 2011
[8]

Towards multi- cue urban curb recognition,

M. Enzweiler, P. Greiner, C. Kn ¨oppel, and U. Franke, “Towards multi- cue urban curb recognition,” in IEEE IV , June 2013, pp. 902–907

work page 2013
[9]

Inferring road boundaries through and despite trafﬁc,

T. Suleymanov, P. Amayo, and P. Newman, “Inferring road boundaries through and despite trafﬁc,” in IEEE ITSC , November 2018

work page 2018
[10]

Feature detection for vehicle localization in urban environments using a multilayer lidar,

A. Y . Hata and D. F. Wolf, “Feature detection for vehicle localization in urban environments using a multilayer lidar,” T-ITS, vol. 17, no. 2, pp. 420–429, Feb 2016

work page 2016
[11]

Road curb detection using 3d lidar and integral laser points for intelligent vehicles,

W. Yao, Z. Deng, and L. Zhou, “Road curb detection using 3d lidar and integral laser points for intelligent vehicles,” in SCIS/ISIS, Nov 2012, pp. 100–105

work page 2012
[12]

Road-segmentation- based curb detection method for self-driving via a 3d-lidar sensor,

Y . Zhang, J. Wang, X. Wang, and J. M. Dolan, “Road-segmentation- based curb detection method for self-driving via a 3d-lidar sensor,” T-ITS, vol. 19, no. 12, pp. 3981–3991, 2018

work page 2018
[13]

Multi-view 3d object detection network for autonomous driving,

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” IEEE CVPR , Jul 2017

work page 2017
[14]

Fast lidar-based road detection using fully convolutional neural networks,

L. Caltagirone, S. Scheidegger, L. Svensson, and M. Wahde, “Fast lidar-based road detection using fully convolutional neural networks,” IEEE IV , Jun 2017

work page 2017
[15]

Visual odometry,

D. Nist ´er, O. Naroditsky, and J. R. Bergen, “Visual odometry,” inIEEE CVPR. IEEE Computer Society, 2004, pp. 652–659

work page 2004
[16]

Real-time video an- notations for augmented reality,

E. Rosten, G. Reitmayr, and T. Drummond, “Real-time video an- notations for augmented reality,” in Advances in Visual Computing . Springer, 2005, pp. 294–302

work page 2005
[17]

Brief: Computing a local binary descriptor very fast,

M. Calonder, V . Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “Brief: Computing a local binary descriptor very fast,” IEEE TPAMI, vol. 34, no. 7, pp. 1281–1298, July 2012

work page 2012
[18]

Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography,

M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun. 1981

work page 1981
[19]

1 Year, 1000km: The Oxford RobotCar Dataset,

W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” IJRR, vol. 36, no. 1, pp. 3–15, 2017

work page 2017
[20]

Direct visibility of point sets,

S. Katz, A. Tal, and R. Basri, “Direct visibility of point sets,” ACM Trans. Graph., vol. 26, no. 3, Jul. 2007

work page 2007
[21]

U-Net: Convolutional Networks for Biomedical Image Segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolu- tional networks for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

Fast R-CNN,

R. B. Girshick, “Fast R-CNN,” in ICCV, 2015

work page 2015
[23]

Spatial as deep: Spatial cnn for trafﬁc scene understanding,

X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial as deep: Spatial cnn for trafﬁc scene understanding,”arXiv preprint arXiv:1712.06080, 2017

work page arXiv 2017
[24]

Fast radar motion estimation with a learnt focus of attention using weak super- vision,

R. Aldera, D. De Martini, M. Gadd, and P. Newman, “Fast radar motion estimation with a learnt focus of attention using weak super- vision,” IEEE ICRA , 2019

work page 2019

[1] [1]

Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,

L. Kunze, T. Bruls, T. Suleymanov, and P. Newman, “Reading be- tween the lanes: Road layout reconstruction from partially segmented scenes,” in IEEE ITSC , Maui, Hawaii, USA, November 2018

work page 2018

[2] [2]

3D road curb extraction from image sequence for automobile parking assist system,

V . Prinet, J. Wang, J. Lee, and D. Wettergreen, “3D road curb extraction from image sequence for automobile parking assist system,” IEEE ICIP , pp. 3847–3851, 2016

work page 2016

[3] [3]

Multi- cue, model-based detection and mapping of road curb features using stereo vision,

M. Kellner, U. Hofmann, M. E. Bouzouraa, and N. Stephan, “Multi- cue, model-based detection and mapping of road curb features using stereo vision,” in IEEE ITSC , Sept 2015, pp. 1221–1228

work page 2015

[4] [4]

Multi-cue road boundary detection using stereo vision,

L. Wang, T. Wu, Z. Xiao, L. Xiao, D. Zhao, and J. Han, “Multi-cue road boundary detection using stereo vision,” in IEEE ICVES , July 2016

work page 2016

[5] [5]

Curb detection based on a multi-frame persistence map for urban driving scenarios,

F. Oniga, S. Nedevschi, and M. M. Meinecke, “Curb detection based on a multi-frame persistence map for urban driving scenarios,” inIEEE ITSC, Oct 2008, pp. 67–72

work page 2008

[6] [6]

Road curb detection based on different elevation mapping techniques,

M. Kellner, M. E. Bouzouraa, and U. Hofmann, “Road curb detection based on different elevation mapping techniques,” in IEEE IV , June 2014, pp. 1217–1224

work page 2014

[7] [7]

A temporal ﬁlter approach for detection and reconstruction of curbs and road surfaces based on conditional random ﬁelds,

J. Siegemund, U. Franke, and W. F ¨orstner, “A temporal ﬁlter approach for detection and reconstruction of curbs and road surfaces based on conditional random ﬁelds,” in IEEE IV , June 2011, pp. 637–642

work page 2011

[8] [8]

Towards multi- cue urban curb recognition,

M. Enzweiler, P. Greiner, C. Kn ¨oppel, and U. Franke, “Towards multi- cue urban curb recognition,” in IEEE IV , June 2013, pp. 902–907

work page 2013

[9] [9]

Inferring road boundaries through and despite trafﬁc,

T. Suleymanov, P. Amayo, and P. Newman, “Inferring road boundaries through and despite trafﬁc,” in IEEE ITSC , November 2018

work page 2018

[10] [10]

Feature detection for vehicle localization in urban environments using a multilayer lidar,

A. Y . Hata and D. F. Wolf, “Feature detection for vehicle localization in urban environments using a multilayer lidar,” T-ITS, vol. 17, no. 2, pp. 420–429, Feb 2016

work page 2016

[11] [11]

Road curb detection using 3d lidar and integral laser points for intelligent vehicles,

W. Yao, Z. Deng, and L. Zhou, “Road curb detection using 3d lidar and integral laser points for intelligent vehicles,” in SCIS/ISIS, Nov 2012, pp. 100–105

work page 2012

[12] [12]

Road-segmentation- based curb detection method for self-driving via a 3d-lidar sensor,

Y . Zhang, J. Wang, X. Wang, and J. M. Dolan, “Road-segmentation- based curb detection method for self-driving via a 3d-lidar sensor,” T-ITS, vol. 19, no. 12, pp. 3981–3991, 2018

work page 2018

[13] [13]

Multi-view 3d object detection network for autonomous driving,

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” IEEE CVPR , Jul 2017

work page 2017

[14] [14]

Fast lidar-based road detection using fully convolutional neural networks,

L. Caltagirone, S. Scheidegger, L. Svensson, and M. Wahde, “Fast lidar-based road detection using fully convolutional neural networks,” IEEE IV , Jun 2017

work page 2017

[15] [15]

Visual odometry,

D. Nist ´er, O. Naroditsky, and J. R. Bergen, “Visual odometry,” inIEEE CVPR. IEEE Computer Society, 2004, pp. 652–659

work page 2004

[16] [16]

Real-time video an- notations for augmented reality,

E. Rosten, G. Reitmayr, and T. Drummond, “Real-time video an- notations for augmented reality,” in Advances in Visual Computing . Springer, 2005, pp. 294–302

work page 2005

[17] [17]

Brief: Computing a local binary descriptor very fast,

M. Calonder, V . Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “Brief: Computing a local binary descriptor very fast,” IEEE TPAMI, vol. 34, no. 7, pp. 1281–1298, July 2012

work page 2012

[18] [18]

Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography,

M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun. 1981

work page 1981

[19] [19]

1 Year, 1000km: The Oxford RobotCar Dataset,

W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,” IJRR, vol. 36, no. 1, pp. 3–15, 2017

work page 2017

[20] [20]

Direct visibility of point sets,

S. Katz, A. Tal, and R. Basri, “Direct visibility of point sets,” ACM Trans. Graph., vol. 26, no. 3, Jul. 2007

work page 2007

[21] [21]

U-Net: Convolutional Networks for Biomedical Image Segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolu- tional networks for biomedical image segmentation,” CoRR, vol. abs/1505.04597, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

Fast R-CNN,

R. B. Girshick, “Fast R-CNN,” in ICCV, 2015

work page 2015

[23] [23]

Spatial as deep: Spatial cnn for trafﬁc scene understanding,

X. Pan, J. Shi, P. Luo, X. Wang, and X. Tang, “Spatial as deep: Spatial cnn for trafﬁc scene understanding,”arXiv preprint arXiv:1712.06080, 2017

work page arXiv 2017

[24] [24]

Fast radar motion estimation with a learnt focus of attention using weak super- vision,

R. Aldera, D. De Martini, M. Gadd, and P. Newman, “Fast radar motion estimation with a learnt focus of attention using weak super- vision,” IEEE ICRA , 2019

work page 2019