SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation

Hongyan Liu; Jun He; Kaixing Yang; Min Zhang; Zhaoxin Fan; Zhenbo Song

arxiv: 2212.11538 · v2 · submitted 2022-12-22 · 💻 cs.CV

SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation

Zhaoxin Fan , Kaixing Yang , Min Zhang , Zhenbo Song , Hongyan Liu , Jun He This is my paper

Pith reviewed 2026-05-24 10:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords height limit estimationstereo visiondevice trackingdepth filteringover-height vehicleDisparity Height datasetcomputer vision pipeline

0 comments

The pith

Stereo pipeline tracks height limit devices then filters depth measurements over time to estimate their clearance with under 10 cm average error at 70 m range.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SHLE as a two-stage stereo vision system that first locates and follows height-limiting objects such as bridges or signs across video frames, then repeatedly samples depth at those locations, extracts stable points, and applies temporal filtering to arrive at a height value. This addresses frequent over-height vehicle collisions by giving drivers advance warning inside ordinary cars. The authors support the claim by releasing a new dataset of stereo pairs with disparity maps and annotated heights, then showing that the full pipeline beats prior methods while keeping error low even when the car is far away. A sympathetic reader would see the work as turning noisy stereo data into a practical, low-cost alert signal through tracking plus filtering rather than single-frame depth.

Core claim

SHLE achieves an average error below 10 cm even when the car is 70 m from the devices by first detecting and tracking the height limit objects in the left or right image, then temporally measuring, extracting, and filtering depth values to compute the limit; the method outperforms all compared baselines on the Disparity Height dataset and reaches state-of-the-art performance.

What carries the argument

The SHLE two-stage pipeline: devices detection and tracking followed by depth measurement, extraction, and filtering.

If this is right

Vehicles equipped with stereo cameras can generate real-time height alerts without expensive sensors.
Early detection at long range gives drivers time to adjust speed or route.
The same tracking-plus-filtering approach can be applied to other roadside objects whose clearance matters.
The released Disparity Height dataset provides a common test bed for future stereo height methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be combined with map data so that once a device is measured its height is stored for later trips.
If depth filtering proves robust in rain or at night, the same pipeline might extend to other low-light traffic safety tasks.
Integration with vehicle CAN bus data could allow automatic speed reduction when an over-height risk is confirmed.

Load-bearing premise

The depth filtering stage can reliably isolate and stabilize measurements to the tracked device across frames despite stereo matching noise, occlusions, or scene motion.

What would settle it

Run the pipeline on stereo video sequences of known-height devices at 70 m distance and measure whether the average absolute error exceeds 10 cm.

Figures

Figures reproduced from arXiv: 2212.11538 by Hongyan Liu, Jun He, Kaixing Yang, Min Zhang, Zhaoxin Fan, Zhenbo Song.

**Figure 2.** Figure 2: display effect for cameras estimation results. To this end, we can get accurate height limit estimation. Note we are the first work of proposing vision based methods for height limit estimation for modern cars. Therefore, there is no public available dataset that we can use. To benchmark our task, we propose a novel large-scale dataset named ”Disparity Height”. ”Disparity Height” is collected in natural o… view at source ↗

**Figure 3.** Figure 3: SHLE. For each frame fi , SHLE takes disparity map Di and RGB image I as input and outputs the height h of corresponding height limit device. For each scene, SHLE will generate a scene-level height after collecting all frames’ output. In stage1, for each frame, we firstly execute object detection by Height Limit Device Detector F(∗) to a get bounding box b of RGB image I, secondly apply object tracking by … view at source ↗

**Figure 4.** Figure 4: Compare between Predict and Ground Truth [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Object Tracking For example, our data set is a collection of image sequences taken in different scenes. Suppose a scene contains a total of M frames of image sequences with valid height limit devices, but the images with valid prediction boxes detected by object detection method may be less than M frames, as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Frustum   xw yw zw   = R−1     xc yc zc   − T   (4) Finally, we add the mounting height to points’ y-axis. However, yw at point A ∈ pw does not yet represent the real world height. yw only represents the relative height of A and stereo camera. At this point, it is also necessary to know the mounting height of stereo camera Hm, and Hm is 1.45m in this paper. So y-axis value y 0 w of the point A … view at source ↗

**Figure 7.** Figure 7: Pixel Extension Firstly, we conduct pixel extension. We extend the lower boundary of the predicted box, then execute frustum-based target extractor for the extended points, finally back-project them into a point cloud. For height limit estimation task, it is important to obtain the lower edge line of the device if the height in 3D space can be accurately calculated. Existing object detection methods, even … view at source ↗

**Figure 8.** Figure 8: Incorrect Contain. Red bounding box is our labeled [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Kernel Density Estimation Third, we conduct kernel density estimation [62]. We first believe that the probability distribution of depth of the points in object detection boundary box should be similar to the normal distribution, that is, bell-shaped, low at the ends and high in the middle. Thus, we first treat its distribution as normal distribution with big standard deviation. Then, interval center point … view at source ↗

**Figure 12.** Figure 12: Data Annotation V. EXPERIMENT A. Dataset and Metric Since we are the first to utilize vision based methods for height limit estimation task. Therefore, there is no public available dataset that we can use. To benchmark our task, we propose a novel large-scale dataset named ”Disparity Height”. For shooting setting, the baseline of our stereo camera is 120 mm, the camera mounting height is 1.45 m, the resol… view at source ↗

**Figure 13.** Figure 13: visualisation effect of SHLE Our trained model is executed on demo data rather than training or validation data, [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

**Figure 14.** Figure 14: Hyper-parameter. Fig. 14 shows the specific hyper-parameter process, taking [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

read the original abstract

Recently, over-height vehicle strike frequently occurs, causing great economic cost and serious safety problems. Hence, an alert system which can accurately discover any possible height limiting devices in advance is necessary to be employed in modern large or medium sized cars, such as touring cars. Detecting and estimating the height limiting devices act as the key point of a successful height limit alert system. Though there are some works research height limit estimation, existing methods are either too computational expensive or not accurate enough. In this paper, we propose a novel stereo-based pipeline named SHLE for height limit estimation. Our SHLE pipeline consists of two stages. In stage 1, a novel devices detection and tracking scheme is introduced, which accurately locate the height limit devices in the left or right image. Then, in stage 2, the depth is temporally measured, extracted and filtered to calculate the height limit device. To benchmark the height limit estimation task, we build a large-scale dataset named "Disparity Height", where stereo images, pre-computed disparities and ground-truth height limit annotations are provided. We conducted extensive experiments on "Disparity Height" and the results show that SHLE achieves an average error below than 10cm though the car is 70m away from the devices. Our method also outperforms all compared baselines and achieves state-of-the-art performance. Code is available at https://github.com/Yang-Kaixing/SHLE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SHLE gives a practical stereo pipeline plus dataset for vehicle height alerts, but the <10 cm error at 70 m looks hard to square with standard stereo error growth without distance-broken stats.

read the letter

The main points are a two-stage stereo system for spotting and measuring height-limiting devices from a moving car, plus the release of the Disparity Height dataset with stereo images, precomputed disparities, and annotations. Stage 1 handles detection and tracking; stage 2 does temporal depth extraction and filtering. Code is public, which helps anyone who wants to check the numbers or extend the work. That is the concrete contribution here. It targets a real automotive safety task and supplies a benchmark that others in applied stereo vision could use. The dataset and end-to-end pipeline are the parts that stand out as useful additions rather than incremental tweaks. The soft spot is the headline accuracy number. The abstract says average error below 10 cm at 70 m and claims SOTA over baselines, yet stereo depth uncertainty grows with the square of distance. Even small residual disparity error after filtering can produce meter-scale uncertainty at that range with typical automotive focal lengths and baselines. The paper needs to show error broken out by distance and evidence that the filtering step actually delivers the variance reduction required under the dataset's noise, occlusions, and motion. If those plots and details are missing or if the metric is dominated by nearer samples, the central claim weakens. The abstract also gives little on the exact filtering method or baseline setups, so the SOTA statement is difficult to assess without the full methods and results sections. This is for people working on practical stereo applications or automotive perception benchmarks. A reader who needs a dataset or a reference pipeline for height estimation would get something out of it. It is focused and grounded enough to go to peer review so referees can check the implementation, error analysis, and whether the long-range numbers hold up.

Referee Report

3 major / 3 minor

Summary. The paper proposes SHLE, a two-stage stereo pipeline for height-limit estimation in vehicles. Stage 1 detects and tracks height-limit devices in left/right images; stage 2 performs temporal depth measurement, extraction, and filtering to compute device heights. A new 'Disparity Height' dataset is introduced containing stereo images, pre-computed disparities, and ground-truth height annotations. Experiments on this dataset are reported to show average height error below 10 cm at distances up to 70 m, with SHLE outperforming all baselines and achieving state-of-the-art performance. Code is released at a public GitHub repository.

Significance. If the temporal filtering stage can demonstrably suppress stereo-matching noise, occlusions, and scene motion to the precision needed for sub-10 cm height error at 70 m, the work would offer a practical, deployable component for automotive height-limit alert systems. Public release of code and a new benchmark dataset are concrete strengths that support reproducibility.

major comments (3)

[Abstract] Abstract: the headline claim of average error below 10 cm at 70 m is load-bearing for the contribution, yet the manuscript provides neither distance-binned error statistics nor quantitative evidence that the stage-2 filtering reduces effective disparity error sufficiently to overcome the quadratic growth of stereo depth uncertainty (standard propagation δd ≈ (d²/(f·b))·δdisp).
[Stage 2] Stage 2 (depth filtering): the description of 'temporally measured, extracted and filtered' depth lacks the concrete algorithm (median, Kalman, or other), ablation studies, or variance-reduction measurements needed to evaluate whether it can isolate device measurements under the dataset's noise, occlusion, and motion conditions.
[Experiments] Experiments section: no table or figure reports error versus distance, dataset distance distribution, or long-range sample counts; without these the SOTA claim and the 70 m result cannot be verified against the known quadratic error scaling.

minor comments (3)

[Abstract] Abstract contains the ungrammatical phrase 'below than 10cm'; correct to 'below 10 cm'.
Dataset statistics (number of sequences, distance histogram, number of devices at >50 m) are not reported, hindering assessment of the benchmark's difficulty and coverage.
Baseline implementations and training details are referenced only generically; explicit citations or configuration tables would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve verifiability of the results.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of average error below 10 cm at 70 m is load-bearing for the contribution, yet the manuscript provides neither distance-binned error statistics nor quantitative evidence that the stage-2 filtering reduces effective disparity error sufficiently to overcome the quadratic growth of stereo depth uncertainty (standard propagation δd ≈ (d²/(f·b))·δdisp).

Authors: We agree that distance-binned statistics and explicit evidence on filtering efficacy would strengthen the claim. In revision we will add a table/figure with mean absolute error binned by distance (including per-bin sample counts) and a quantitative comparison of disparity variance before versus after temporal filtering to demonstrate reduction relative to the quadratic uncertainty scaling. revision: yes
Referee: [Stage 2] Stage 2 (depth filtering): the description of 'temporally measured, extracted and filtered' depth lacks the concrete algorithm (median, Kalman, or other), ablation studies, or variance-reduction measurements needed to evaluate whether it can isolate device measurements under the dataset's noise, occlusion, and motion conditions.

Authors: We will expand the Stage 2 section to name the exact filtering algorithm and its parameters, add ablation results (with/without filtering), and report measured variance reduction on the disparity values under the dataset conditions. revision: yes
Referee: [Experiments] Experiments section: no table or figure reports error versus distance, dataset distance distribution, or long-range sample counts; without these the SOTA claim and the 70 m result cannot be verified against the known quadratic error scaling.

Authors: We will add to the Experiments section a plot or table of error versus distance, the distance histogram of the Disparity Height dataset, and explicit counts of samples at long ranges (including near 70 m) to support verification. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline evaluated on held-out dataset

full rationale

The SHLE pipeline is a two-stage stereo vision method (device detection/tracking then temporal depth extraction/filtering) whose performance claims are supported solely by empirical results on the independently annotated 'Disparity Height' dataset. No equations, derivations, or 'predictions' are presented that reduce by construction to fitted inputs, self-citations, or ansatzes; the method contains no load-bearing uniqueness theorems or self-referential definitions. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the method is presented as an empirical pipeline.

pith-pipeline@v0.9.0 · 5796 in / 994 out tokens · 24906 ms · 2026-05-24T10:30:59.411381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

[1]

”Stereo vision-Facing the challenges and seeing the op- portunities for ADAS applications.” Texas Instruments Technical Note (2016)

Dubey, Aish. ”Stereo vision-Facing the challenges and seeing the op- portunities for ADAS applications.” Texas Instruments Technical Note (2016)

work page 2016
[2]

”Distance measurement system for au- tonomous vehicles using stereo camera.” Array 5 (2020): 100016

Zaarane, Abdelmoghit, et al. ”Distance measurement system for au- tonomous vehicles using stereo camera.” Array 5 (2020): 100016

work page 2020
[4]

”Spatial pyramid pooling in deep convolutional networks for visual recognition.” IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916

He, Kaiming, et al. ”Spatial pyramid pooling in deep convolutional networks for visual recognition.” IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916

work page 2015
[5]

Vision-based over-height vehicle detection for warning drivers

Nguyen, Bella. Vision-based over-height vehicle detection for warning drivers. Diss. University of Cambridge, 2018

work page 2018
[6]

” 车载限高障碍物检测系统的设计与实现.” 电光系统 2 (2018): 13-17

刘梦. ” 车载限高障碍物检测系统的设计与实现.” 电光系统 2 (2018): 13-17

work page 2018
[7]

” 激光雷达辅助驾驶道路参数计算方法研究.” 应用光学 41.1 (2020): 209

游安清, et al. ” 激光雷达辅助驾驶道路参数计算方法研究.” 应用光学 41.1 (2020): 209

work page 2020
[8]

车载道路限制几何信息测量和超高预警方法研究

张阔. 车载道路限制几何信息测量和超高预警方法研究. MS thesis. 燕山大学, 2014

work page 2014
[9]

”Detection of individual trees and estimation of tree height using LiDAR data.” Journal of Forest Research 12.6 (2007): 425-434

Kwak, Doo-Ahn, et al. ”Detection of individual trees and estimation of tree height using LiDAR data.” Journal of Forest Research 12.6 (2007): 425-434

work page 2007
[10]

Rosette, J. A. B., P. R. J. North, and J. C. Suarez. ”Vegetation height estimates for a mixed temperate forest using satellite laser altimetry.” International journal of remote sensing 29.5 (2008): 1475-1493

work page 2008
[11]

”Crop height monitoring with digital imagery from Unmanned Aerial System (UAS).” Computers and Electronics in Agriculture 141 (2017): 232-237

Chang, Anjin, et al. ”Crop height monitoring with digital imagery from Unmanned Aerial System (UAS).” Computers and Electronics in Agriculture 141 (2017): 232-237

work page 2017
[12]

”Biomass and crop height estimation of different crops using UA V-based LiDAR.” Remote Sensing 12.1 (2019): 17

ten Harkel, Jelle, Harm Bartholomeus, and Lammert Kooistra. ”Biomass and crop height estimation of different crops using UA V-based LiDAR.” Remote Sensing 12.1 (2019): 17

work page 2019
[13]

”Wheat height estimation using LiDAR in compar- ison to ultrasonic sensor and UAS.” Sensors 18.11 (2018): 3731

Yuan, Wenan, et al. ”Wheat height estimation using LiDAR in compar- ison to ultrasonic sensor and UAS.” Sensors 18.11 (2018): 3731

work page 2018
[14]

”Regression kriging for improving crop height models fusing ultra-sonic sensing with UA V imagery.” Remote Sensing 9.7 (2017): 665

Schirrmann, Michael, et al. ”Regression kriging for improving crop height models fusing ultra-sonic sensing with UA V imagery.” Remote Sensing 9.7 (2017): 665

work page 2017
[15]

”Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles.” Remote Sensing of Environment 268 (2022): 112760

Lang, Nico, et al. ”Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles.” Remote Sensing of Environment 268 (2022): 112760

work page 2022
[16]

”Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information process- ing systems 28 (2015)

Ren, Shaoqing, et al. ”Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information process- ing systems 28 (2015)

work page 2015
[17]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional net- works for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[18]

”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition

He, Kaiming, et al. ”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016
[19]

”Mobilenetv2: Inverted residuals and linear bot- tlenecks.” Proceedings of the IEEE conference on computer vision and pattern recognition

Sandler, Mark, et al. ”Mobilenetv2: Inverted residuals and linear bot- tlenecks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018
[20]

”Searching for mobilenetv3.” Proceedings of the IEEE/CVF international conference on computer vision

Howard, Andrew, et al. ”Searching for mobilenetv3.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019
[21]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, Alexey, et al. ”An image is worth 16x16 words: Trans- formers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[22]

”Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Liu, Ze, et al. ”Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021
[23]

”Vivit: A video vision transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Arnab, Anurag, et al. ”Vivit: A video vision transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021
[24]

”Fully convolu- tional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. ”Fully convolu- tional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015

work page 2015
[25]

”U-net: Convo- lutional networks for biomedical image segmentation.” International Con- ference on Medical image computing and computer-assisted intervention

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. ”U-net: Convo- lutional networks for biomedical image segmentation.” International Con- ference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015

work page 2015
[26]

”Unet++: A nested u-net architecture for medical image segmentation.” Deep learning in medical image analysis and multimodal learning for clinical decision support

Zhou, Zongwei, et al. ”Unet++: A nested u-net architecture for medical image segmentation.” Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, 2018. 3-11

work page 2018
[27]

”H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes.” IEEE transactions on medical imaging 37.12 (2018): 2663-2674

Li, Xiaomeng, et al. ”H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes.” IEEE transactions on medical imaging 37.12 (2018): 2663-2674

work page 2018
[28]

”Robust object tracking with online multiple instance learning.” IEEE transactions on pattern analysis and machine intelligence 33.8 (2010): 1619-1632

Babenko, Boris, Ming-Hsuan Yang, and Serge Belongie. ”Robust object tracking with online multiple instance learning.” IEEE transactions on pattern analysis and machine intelligence 33.8 (2010): 1619-1632

work page 2010
[29]

”High-speed tracking with kernelized cor- relation ﬁlters.” IEEE transactions on pattern analysis and machine intelligence 37.3 (2014): 583-596

Henriques, Jo ˜ao F., et al. ”High-speed tracking with kernelized cor- relation ﬁlters.” IEEE transactions on pattern analysis and machine intelligence 37.3 (2014): 583-596

work page 2014
[30]

”Discriminative correlation ﬁlter with channel and spatial reliability.” Proceedings of the IEEE conference on computer vision and pattern recognition

Lukezic, Alan, et al. ”Discriminative correlation ﬁlter with channel and spatial reliability.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017
[31]

”Visual object tracking using adaptive correlation ﬁlters.” 2010 IEEE computer society conference on computer vision and pattern recognition

Bolme, David S., et al. ”Visual object tracking using adaptive correlation ﬁlters.” 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010

work page 2010
[32]

”Real-time tracking via on-line boosting.” Bmvc

Grabner, Helmut, Michael Grabner, and Horst Bischof. ”Real-time tracking via on-line boosting.” Bmvc. V ol. 1. No. 5. 2006. 13

work page 2006
[33]

”Forward- backward error: Automatic detection of tracking failures.” 2010 20th international conference on pattern recognition

Kalal, Zdenek, Krystian Mikolajczyk, and Jiri Matas. ”Forward- backward error: Automatic detection of tracking failures.” 2010 20th international conference on pattern recognition. IEEE, 2010

work page 2010
[34]

”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems 27 (2014)

Eigen, David, Christian Puhrsch, and Rob Fergus. ”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems 27 (2014)

work page 2014
[35]

”Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” Proceedings of the IEEE international conference on computer vision

Eigen, David, and Rob Fergus. ”Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” Proceedings of the IEEE international conference on computer vision. 2015

work page 2015
[36]

”Deep ordinal regression network for monocular depth estimation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Fu, Huan, et al. ”Deep ordinal regression network for monocular depth estimation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018
[37]

”Adabins: Depth estimation using adaptive bins.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Bhat, Shariq Farooq, Ibraheem Alhashim, and Peter Wonka. ”Adabins: Depth estimation using adaptive bins.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

work page 2021
[38]

Godard, Cl ´ement, Oisin Mac Aodha, and Gabriel J. Brostow. ”Unsuper- vised monocular depth estimation with left-right consistency.” Proceed- ings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017
[39]

”Unsupervised monocular depth estimation using attention and multi-warp reconstruc- tion.” IEEE Transactions on Multimedia (2021)

Ling, Chuanwu, Xiaogang Zhang, and Hua Chen. ”Unsupervised monocular depth estimation using attention and multi-warp reconstruc- tion.” IEEE Transactions on Multimedia (2021)

work page 2021
[40]

”Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Gordon, Ariel, et al. ”Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019

work page 2019
[41]

”Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.” 2019 International Conference on Robotics and Automation (ICRA)

Ma, Fangchang, Guilherme Venturelli Cavalheiro, and Sertac Karaman. ”Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.” 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019

work page 2019
[42]

”Self-Supervised Depth Completion From Direct Visual-LiDAR Odometry in Autonomous Driving.” IEEE Transactions on Intelligent Transportation Systems (2021)

Song, Zhenbo, et al. ”Self-Supervised Depth Completion From Direct Visual-LiDAR Odometry in Autonomous Driving.” IEEE Transactions on Intelligent Transportation Systems (2021)

work page 2021
[43]

”Selfdeco: Self-supervised monocular depth com- pletion in challenging indoor environments.” 2021 IEEE International Conference on Robotics and Automation (ICRA)

Choi, Jaehoon, et al. ”Selfdeco: Self-supervised monocular depth com- pletion in challenging indoor environments.” 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021

work page 2021
[44]

”Learning rich features from RGB-D images for object detection and segmentation.” European conference on computer vision

Gupta, Saurabh, et al. ”Learning rich features from RGB-D images for object detection and segmentation.” European conference on computer vision. Springer, Cham, 2014

work page 2014
[45]

”Multimodal deep learning for robust RGB-D ob- ject recognition.” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Eitel, Andreas, et al. ”Multimodal deep learning for robust RGB-D ob- ject recognition.” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015

work page 2015
[46]

”CANet: Co-attention network for RGB-D semantic segmentation.” Pattern Recognition 124 (2022): 108468

Zhou, Hao, et al. ”CANet: Co-attention network for RGB-D semantic segmentation.” Pattern Recognition 124 (2022): 108468

work page 2022
[47]

”Intrinsic scene properties from a single rgb-d image.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Barron, Jonathan T., and Jitendra Malik. ”Intrinsic scene properties from a single rgb-d image.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013

work page 2013
[48]

”Single image depth estimation from predicted semantic labels.” 2010 IEEE computer society conference on computer vision and pattern recognition

Liu, Beyang, Stephen Gould, and Daphne Koller. ”Single image depth estimation from predicted semantic labels.” 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010

work page 2010
[49]

”Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance.” European Conference on Computer Vision

Klingner, Marvin, et al. ”Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance.” European Conference on Computer Vision. Springer, Cham, 2020

work page 2020
[50]

”Robust object proposals re- ranking for object detection in autonomous driving using convolutional neural networks.” Signal Processing: Image Communication 53 (2017): 110-122

Pham, Cuong Cao, and Jae Wook Jeon. ”Robust object proposals re- ranking for object detection in autonomous driving using convolutional neural networks.” Signal Processing: Image Communication 53 (2017): 110-122

work page 2017
[51]

”Data-driven 3d voxel patterns for object category recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition

Xiang, Yu, et al. ”Data-driven 3d voxel patterns for object category recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015

work page 2015
[52]

”Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Qi, Charles R., et al. ”Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017
[53]

”Pointnet++: Deep hierarchical feature learning on point sets in a metric space.” Advances in neural information processing systems 30 (2017)

Qi, Charles Ruizhongtai, et al. ”Pointnet++: Deep hierarchical feature learning on point sets in a metric space.” Advances in neural information processing systems 30 (2017)

work page 2017
[54]

”Frustum pointnets for 3d object detection from rgb-d data.” Proceedings of the IEEE conference on computer vision and pattern recognition

Qi, Charles R., et al. ”Frustum pointnets for 3d object detection from rgb-d data.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018
[55]

”Pointnetlk: Robust & efﬁcient point cloud registration using pointnet.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Aoki, Yasuhiro, et al. ”Pointnetlk: Robust & efﬁcient point cloud registration using pointnet.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019

work page 2019
[56]

”CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers.” arXiv preprint arXiv:2207.02202 (2022)

Xu, Runsheng, et al. ”CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers.” arXiv preprint arXiv:2207.02202 (2022)

work page arXiv 2022
[57]

”V2X-ViT: Vehicle-to-everything cooperative per- ception with vision transformer.” arXiv preprint arXiv:2203.10638 (2022)

Xu, Runsheng, et al. ”V2X-ViT: Vehicle-to-everything cooperative per- ception with vision transformer.” arXiv preprint arXiv:2203.10638 (2022)

work page arXiv 2022
[58]

Zhu, Xingkui, et al. ”TPH-YOLOv5: Improved YOLOv5 based on trans- former prediction head for object detection on drone-captured scenarios.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021
[59]

”Centernet: Keypoint triplets for object detection.” Proceedings of the IEEE/CVF international conference on computer vision

Duan, Kaiwen, et al. ”Centernet: Keypoint triplets for object detection.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019
[60]

”Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision

Lin, Tsung-Yi, et al. ”Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision. 2017

work page 2017
[61]

”Fcos: Fully convolutional one-stage object detection.” Proceedings of the IEEE/CVF international conference on computer vision

Tian, Zhi, et al. ”Fcos: Fully convolutional one-stage object detection.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019
[62]

”On estimation of a probability density function and mode.” The annals of mathematical statistics 33.3 (1962): 1065-1076

Parzen, Emanuel. ”On estimation of a probability density function and mode.” The annals of mathematical statistics 33.3 (1962): 1065-1076

work page 1962
[63]

”A new approach to linear ﬁltering and predic- tion problems.” (1960): 35-45

Kalman, Rudolph Emil. ”A new approach to linear ﬁltering and predic- tion problems.” (1960): 35-45

work page 1960

[1] [1]

”Stereo vision-Facing the challenges and seeing the op- portunities for ADAS applications.” Texas Instruments Technical Note (2016)

Dubey, Aish. ”Stereo vision-Facing the challenges and seeing the op- portunities for ADAS applications.” Texas Instruments Technical Note (2016)

work page 2016

[2] [2]

”Distance measurement system for au- tonomous vehicles using stereo camera.” Array 5 (2020): 100016

Zaarane, Abdelmoghit, et al. ”Distance measurement system for au- tonomous vehicles using stereo camera.” Array 5 (2020): 100016

work page 2020

[3] [4]

”Spatial pyramid pooling in deep convolutional networks for visual recognition.” IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916

He, Kaiming, et al. ”Spatial pyramid pooling in deep convolutional networks for visual recognition.” IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916

work page 2015

[4] [5]

Vision-based over-height vehicle detection for warning drivers

Nguyen, Bella. Vision-based over-height vehicle detection for warning drivers. Diss. University of Cambridge, 2018

work page 2018

[5] [6]

” 车载限高障碍物检测系统的设计与实现.” 电光系统 2 (2018): 13-17

刘梦. ” 车载限高障碍物检测系统的设计与实现.” 电光系统 2 (2018): 13-17

work page 2018

[6] [7]

” 激光雷达辅助驾驶道路参数计算方法研究.” 应用光学 41.1 (2020): 209

游安清, et al. ” 激光雷达辅助驾驶道路参数计算方法研究.” 应用光学 41.1 (2020): 209

work page 2020

[7] [8]

车载道路限制几何信息测量和超高预警方法研究

张阔. 车载道路限制几何信息测量和超高预警方法研究. MS thesis. 燕山大学, 2014

work page 2014

[8] [9]

”Detection of individual trees and estimation of tree height using LiDAR data.” Journal of Forest Research 12.6 (2007): 425-434

Kwak, Doo-Ahn, et al. ”Detection of individual trees and estimation of tree height using LiDAR data.” Journal of Forest Research 12.6 (2007): 425-434

work page 2007

[9] [10]

Rosette, J. A. B., P. R. J. North, and J. C. Suarez. ”Vegetation height estimates for a mixed temperate forest using satellite laser altimetry.” International journal of remote sensing 29.5 (2008): 1475-1493

work page 2008

[10] [11]

”Crop height monitoring with digital imagery from Unmanned Aerial System (UAS).” Computers and Electronics in Agriculture 141 (2017): 232-237

Chang, Anjin, et al. ”Crop height monitoring with digital imagery from Unmanned Aerial System (UAS).” Computers and Electronics in Agriculture 141 (2017): 232-237

work page 2017

[11] [12]

”Biomass and crop height estimation of different crops using UA V-based LiDAR.” Remote Sensing 12.1 (2019): 17

ten Harkel, Jelle, Harm Bartholomeus, and Lammert Kooistra. ”Biomass and crop height estimation of different crops using UA V-based LiDAR.” Remote Sensing 12.1 (2019): 17

work page 2019

[12] [13]

”Wheat height estimation using LiDAR in compar- ison to ultrasonic sensor and UAS.” Sensors 18.11 (2018): 3731

Yuan, Wenan, et al. ”Wheat height estimation using LiDAR in compar- ison to ultrasonic sensor and UAS.” Sensors 18.11 (2018): 3731

work page 2018

[13] [14]

”Regression kriging for improving crop height models fusing ultra-sonic sensing with UA V imagery.” Remote Sensing 9.7 (2017): 665

Schirrmann, Michael, et al. ”Regression kriging for improving crop height models fusing ultra-sonic sensing with UA V imagery.” Remote Sensing 9.7 (2017): 665

work page 2017

[14] [15]

”Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles.” Remote Sensing of Environment 268 (2022): 112760

Lang, Nico, et al. ”Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles.” Remote Sensing of Environment 268 (2022): 112760

work page 2022

[15] [16]

”Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information process- ing systems 28 (2015)

Ren, Shaoqing, et al. ”Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information process- ing systems 28 (2015)

work page 2015

[16] [17]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional net- works for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[17] [18]

”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition

He, Kaiming, et al. ”Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016

work page 2016

[18] [19]

”Mobilenetv2: Inverted residuals and linear bot- tlenecks.” Proceedings of the IEEE conference on computer vision and pattern recognition

Sandler, Mark, et al. ”Mobilenetv2: Inverted residuals and linear bot- tlenecks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018

[19] [20]

”Searching for mobilenetv3.” Proceedings of the IEEE/CVF international conference on computer vision

Howard, Andrew, et al. ”Searching for mobilenetv3.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019

[20] [21]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, Alexey, et al. ”An image is worth 16x16 words: Trans- formers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[21] [22]

”Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Liu, Ze, et al. ”Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021

[22] [23]

”Vivit: A video vision transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Arnab, Anurag, et al. ”Vivit: A video vision transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021

[23] [24]

”Fully convolu- tional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. ”Fully convolu- tional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015

work page 2015

[24] [25]

”U-net: Convo- lutional networks for biomedical image segmentation.” International Con- ference on Medical image computing and computer-assisted intervention

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. ”U-net: Convo- lutional networks for biomedical image segmentation.” International Con- ference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015

work page 2015

[25] [26]

”Unet++: A nested u-net architecture for medical image segmentation.” Deep learning in medical image analysis and multimodal learning for clinical decision support

Zhou, Zongwei, et al. ”Unet++: A nested u-net architecture for medical image segmentation.” Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, 2018. 3-11

work page 2018

[26] [27]

”H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes.” IEEE transactions on medical imaging 37.12 (2018): 2663-2674

Li, Xiaomeng, et al. ”H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes.” IEEE transactions on medical imaging 37.12 (2018): 2663-2674

work page 2018

[27] [28]

”Robust object tracking with online multiple instance learning.” IEEE transactions on pattern analysis and machine intelligence 33.8 (2010): 1619-1632

Babenko, Boris, Ming-Hsuan Yang, and Serge Belongie. ”Robust object tracking with online multiple instance learning.” IEEE transactions on pattern analysis and machine intelligence 33.8 (2010): 1619-1632

work page 2010

[28] [29]

”High-speed tracking with kernelized cor- relation ﬁlters.” IEEE transactions on pattern analysis and machine intelligence 37.3 (2014): 583-596

Henriques, Jo ˜ao F., et al. ”High-speed tracking with kernelized cor- relation ﬁlters.” IEEE transactions on pattern analysis and machine intelligence 37.3 (2014): 583-596

work page 2014

[29] [30]

”Discriminative correlation ﬁlter with channel and spatial reliability.” Proceedings of the IEEE conference on computer vision and pattern recognition

Lukezic, Alan, et al. ”Discriminative correlation ﬁlter with channel and spatial reliability.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017

[30] [31]

”Visual object tracking using adaptive correlation ﬁlters.” 2010 IEEE computer society conference on computer vision and pattern recognition

Bolme, David S., et al. ”Visual object tracking using adaptive correlation ﬁlters.” 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010

work page 2010

[31] [32]

”Real-time tracking via on-line boosting.” Bmvc

Grabner, Helmut, Michael Grabner, and Horst Bischof. ”Real-time tracking via on-line boosting.” Bmvc. V ol. 1. No. 5. 2006. 13

work page 2006

[32] [33]

”Forward- backward error: Automatic detection of tracking failures.” 2010 20th international conference on pattern recognition

Kalal, Zdenek, Krystian Mikolajczyk, and Jiri Matas. ”Forward- backward error: Automatic detection of tracking failures.” 2010 20th international conference on pattern recognition. IEEE, 2010

work page 2010

[33] [34]

”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems 27 (2014)

Eigen, David, Christian Puhrsch, and Rob Fergus. ”Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems 27 (2014)

work page 2014

[34] [35]

”Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” Proceedings of the IEEE international conference on computer vision

Eigen, David, and Rob Fergus. ”Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.” Proceedings of the IEEE international conference on computer vision. 2015

work page 2015

[35] [36]

”Deep ordinal regression network for monocular depth estimation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Fu, Huan, et al. ”Deep ordinal regression network for monocular depth estimation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018

[36] [37]

”Adabins: Depth estimation using adaptive bins.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Bhat, Shariq Farooq, Ibraheem Alhashim, and Peter Wonka. ”Adabins: Depth estimation using adaptive bins.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

work page 2021

[37] [38]

Godard, Cl ´ement, Oisin Mac Aodha, and Gabriel J. Brostow. ”Unsuper- vised monocular depth estimation with left-right consistency.” Proceed- ings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017

[38] [39]

”Unsupervised monocular depth estimation using attention and multi-warp reconstruc- tion.” IEEE Transactions on Multimedia (2021)

Ling, Chuanwu, Xiaogang Zhang, and Hua Chen. ”Unsupervised monocular depth estimation using attention and multi-warp reconstruc- tion.” IEEE Transactions on Multimedia (2021)

work page 2021

[39] [40]

”Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras.” Proceedings of the IEEE/CVF International Conference on Computer Vision

Gordon, Ariel, et al. ”Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019

work page 2019

[40] [41]

”Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.” 2019 International Conference on Robotics and Automation (ICRA)

Ma, Fangchang, Guilherme Venturelli Cavalheiro, and Sertac Karaman. ”Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera.” 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019

work page 2019

[41] [42]

”Self-Supervised Depth Completion From Direct Visual-LiDAR Odometry in Autonomous Driving.” IEEE Transactions on Intelligent Transportation Systems (2021)

Song, Zhenbo, et al. ”Self-Supervised Depth Completion From Direct Visual-LiDAR Odometry in Autonomous Driving.” IEEE Transactions on Intelligent Transportation Systems (2021)

work page 2021

[42] [43]

”Selfdeco: Self-supervised monocular depth com- pletion in challenging indoor environments.” 2021 IEEE International Conference on Robotics and Automation (ICRA)

Choi, Jaehoon, et al. ”Selfdeco: Self-supervised monocular depth com- pletion in challenging indoor environments.” 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021

work page 2021

[43] [44]

”Learning rich features from RGB-D images for object detection and segmentation.” European conference on computer vision

Gupta, Saurabh, et al. ”Learning rich features from RGB-D images for object detection and segmentation.” European conference on computer vision. Springer, Cham, 2014

work page 2014

[44] [45]

”Multimodal deep learning for robust RGB-D ob- ject recognition.” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Eitel, Andreas, et al. ”Multimodal deep learning for robust RGB-D ob- ject recognition.” 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015

work page 2015

[45] [46]

”CANet: Co-attention network for RGB-D semantic segmentation.” Pattern Recognition 124 (2022): 108468

Zhou, Hao, et al. ”CANet: Co-attention network for RGB-D semantic segmentation.” Pattern Recognition 124 (2022): 108468

work page 2022

[46] [47]

”Intrinsic scene properties from a single rgb-d image.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Barron, Jonathan T., and Jitendra Malik. ”Intrinsic scene properties from a single rgb-d image.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013

work page 2013

[47] [48]

”Single image depth estimation from predicted semantic labels.” 2010 IEEE computer society conference on computer vision and pattern recognition

Liu, Beyang, Stephen Gould, and Daphne Koller. ”Single image depth estimation from predicted semantic labels.” 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010

work page 2010

[48] [49]

”Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance.” European Conference on Computer Vision

Klingner, Marvin, et al. ”Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance.” European Conference on Computer Vision. Springer, Cham, 2020

work page 2020

[49] [50]

”Robust object proposals re- ranking for object detection in autonomous driving using convolutional neural networks.” Signal Processing: Image Communication 53 (2017): 110-122

Pham, Cuong Cao, and Jae Wook Jeon. ”Robust object proposals re- ranking for object detection in autonomous driving using convolutional neural networks.” Signal Processing: Image Communication 53 (2017): 110-122

work page 2017

[50] [51]

”Data-driven 3d voxel patterns for object category recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition

Xiang, Yu, et al. ”Data-driven 3d voxel patterns for object category recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015

work page 2015

[51] [52]

”Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition

Qi, Charles R., et al. ”Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

work page 2017

[52] [53]

”Pointnet++: Deep hierarchical feature learning on point sets in a metric space.” Advances in neural information processing systems 30 (2017)

Qi, Charles Ruizhongtai, et al. ”Pointnet++: Deep hierarchical feature learning on point sets in a metric space.” Advances in neural information processing systems 30 (2017)

work page 2017

[53] [54]

”Frustum pointnets for 3d object detection from rgb-d data.” Proceedings of the IEEE conference on computer vision and pattern recognition

Qi, Charles R., et al. ”Frustum pointnets for 3d object detection from rgb-d data.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

work page 2018

[54] [55]

”Pointnetlk: Robust & efﬁcient point cloud registration using pointnet.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Aoki, Yasuhiro, et al. ”Pointnetlk: Robust & efﬁcient point cloud registration using pointnet.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019

work page 2019

[55] [56]

”CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers.” arXiv preprint arXiv:2207.02202 (2022)

Xu, Runsheng, et al. ”CoBEVT: Cooperative bird’s eye view semantic segmentation with sparse transformers.” arXiv preprint arXiv:2207.02202 (2022)

work page arXiv 2022

[56] [57]

”V2X-ViT: Vehicle-to-everything cooperative per- ception with vision transformer.” arXiv preprint arXiv:2203.10638 (2022)

Xu, Runsheng, et al. ”V2X-ViT: Vehicle-to-everything cooperative per- ception with vision transformer.” arXiv preprint arXiv:2203.10638 (2022)

work page arXiv 2022

[57] [58]

Zhu, Xingkui, et al. ”TPH-YOLOv5: Improved YOLOv5 based on trans- former prediction head for object detection on drone-captured scenarios.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021

work page 2021

[58] [59]

”Centernet: Keypoint triplets for object detection.” Proceedings of the IEEE/CVF international conference on computer vision

Duan, Kaiwen, et al. ”Centernet: Keypoint triplets for object detection.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019

[59] [60]

”Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision

Lin, Tsung-Yi, et al. ”Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision. 2017

work page 2017

[60] [61]

”Fcos: Fully convolutional one-stage object detection.” Proceedings of the IEEE/CVF international conference on computer vision

Tian, Zhi, et al. ”Fcos: Fully convolutional one-stage object detection.” Proceedings of the IEEE/CVF international conference on computer vision. 2019

work page 2019

[61] [62]

”On estimation of a probability density function and mode.” The annals of mathematical statistics 33.3 (1962): 1065-1076

Parzen, Emanuel. ”On estimation of a probability density function and mode.” The annals of mathematical statistics 33.3 (1962): 1065-1076

work page 1962

[62] [63]

”A new approach to linear ﬁltering and predic- tion problems.” (1960): 35-45

Kalman, Rudolph Emil. ”A new approach to linear ﬁltering and predic- tion problems.” (1960): 35-45

work page 1960