CaRLi-V: Camera-RADAR-LiDAR Point-Wise 3D Velocity Estimation
Pith reviewed 2026-05-18 01:26 UTC · model grok-4.3
The pith
CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to estimate 3D velocities at dense points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By leveraging raw RADAR measurements to create a novel RADAR representation, the velocity cube, which densely encodes RADAR radial velocities, and combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution, the approach can produce 3D velocity estimates for a dense array of points and outperforms state-of-the-art scene flow methods.
What carries the argument
The velocity cube, a dense encoding of RADAR radial velocities, which supplies one velocity component that optical flow and LiDAR ranges complete via closed-form solution to recover full 3D vectors.
If this is right
- Produces 3D velocity estimates for a dense array of points in the scene.
- Achieves low velocity error relative to ground truth on the authors' custom dataset.
- Outperforms state-of-the-art scene flow methods.
- Supports improved path planning, collision avoidance, and object manipulation around dynamic agents.
- Ships as an open-source ROS2 package ready for field use.
Where Pith is reading between the lines
- The same sensor combination could support velocity-aware mapping for longer-term robot navigation.
- Extending the closed-form step to include the robot's own motion would allow use on moving platforms without extra compensation layers.
- The dense point velocities could feed directly into existing multi-object trackers to improve association over time.
- Public benchmark tests would clarify how the method compares when ground truth comes from different sources.
Load-bearing premise
The closed-form solution can recover accurate full 3D velocity vectors from the three sensor inputs without major degradation from calibration errors, synchronization problems, or non-rigid object motion.
What would settle it
Velocity estimates that show large errors against ground truth when the sensors are deliberately desynchronized by tens of milliseconds or when the scene contains visibly deforming objects would show the closed-form recovery does not hold.
Figures
read the original abstract
Accurate point-wise velocity estimation in 3D is crucial for robot interaction with non-rigid dynamic agents, enabling robust performance in path planning, collision avoidance, and object manipulation in dynamic environments. To this end, this paper proposes a novel RADAR, LiDAR, and camera fusion pipeline for point-wise 3D velocity estimation named CaRLi-V. This pipeline leverages raw RADAR measurements to create a novel RADAR representation, the velocity cube, which densely encodes RADAR radial velocities. By combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution, our approach can produce 3D velocity estimates for a dense array of points. Developed as an open-source ROS2 package, CaRLi-V has been field-tested on a custom dataset and achieves low velocity error metrics relative to ground truth while outperforming state-of-the-art scene flow methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CaRLi-V, a RADAR-camera-LiDAR fusion pipeline for point-wise 3D velocity estimation. It introduces a 'velocity cube' representation derived from raw RADAR measurements to densely encode radial velocities, combines this with optical flow from the camera for tangential velocity components and LiDAR point-wise ranges, and solves for full 3D velocity vectors via a closed-form geometric solution. The method is released as an open-source ROS2 package, field-tested on a custom dataset, and claims low velocity error relative to ground truth while outperforming state-of-the-art scene flow methods.
Significance. If the closed-form fusion proves robust under realistic conditions, the work offers a lightweight, interpretable, parameter-free alternative to learning-based scene flow estimators for dense 3D velocity in dynamic robotics scenarios. Strengths include the open-source ROS2 implementation, real-world field testing, and explicit use of a closed-form solution rather than learned models.
major comments (2)
- [Abstract / pipeline description] Abstract and pipeline description: the central claim that the closed-form solution recovers accurate 3D velocities from RADAR radial components (via the velocity cube), camera tangential components, and LiDAR ranges is load-bearing for the outperformance result, yet the manuscript provides no quantitative sensitivity analysis or ablation on sensor calibration errors, synchronization offsets, or deviations from the instantaneous rigid-motion assumption. These factors are required for algebraic exactness of the decomposition.
- [Abstract] Abstract: the assertion of 'low velocity error metrics' and outperformance versus scene flow baselines is stated without any numerical values, error bars, dataset size, ground-truth acquisition details, or ablation tables, preventing verification of the soundness of the reported results.
minor comments (1)
- [Abstract] The term 'velocity cube' is introduced without a formal definition or equation showing how raw RADAR returns are binned or interpolated into the dense representation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper to improve clarity and strengthen the supporting evidence for our claims.
read point-by-point responses
-
Referee: [Abstract / pipeline description] Abstract and pipeline description: the central claim that the closed-form solution recovers accurate 3D velocities from RADAR radial components (via the velocity cube), camera tangential components, and LiDAR ranges is load-bearing for the outperformance result, yet the manuscript provides no quantitative sensitivity analysis or ablation on sensor calibration errors, synchronization offsets, or deviations from the instantaneous rigid-motion assumption. These factors are required for algebraic exactness of the decomposition.
Authors: We agree that explicit sensitivity analysis would better substantiate the robustness of the closed-form geometric solution. The derivation relies on accurate extrinsic calibration, temporal synchronization, and the approximation of constant velocity over the brief interval between sensor measurements. While our field tests on the custom dataset demonstrate practical performance, the manuscript indeed lacks a dedicated quantitative study of these factors. In the revised version we will add a new subsection (or appendix) that reports the effects of realistic calibration perturbations (e.g., 0.5–2° extrinsic errors), synchronization offsets (10–100 ms), and controlled deviations from the constant-velocity assumption, using both synthetic perturbations and re-processing of the recorded sequences. This addition will directly address the algebraic-exactness concern. revision: yes
-
Referee: [Abstract] Abstract: the assertion of 'low velocity error metrics' and outperformance versus scene flow baselines is stated without any numerical values, error bars, dataset size, ground-truth acquisition details, or ablation tables, preventing verification of the soundness of the reported results.
Authors: We concur that the abstract should contain concrete numerical results to enable readers to evaluate the claims immediately. The body of the manuscript already reports specific velocity error metrics, standard deviations, dataset size (number of frames and sequences), ground-truth acquisition method, and comparison tables against scene-flow baselines. We will revise the abstract to include the key quantitative figures and dataset details while respecting length constraints, thereby improving verifiability without altering the technical content. revision: yes
Circularity Check
No circularity: closed-form geometric fusion from independent sensor inputs
full rationale
The paper's central derivation is a closed-form algebraic combination of three independent sensor-derived quantities (RADAR radial velocities via the velocity cube, camera optical flow for tangential components, and LiDAR point ranges) to recover 3D velocity vectors. No step reduces to a fitted parameter renamed as a prediction, a self-definition, or a load-bearing self-citation chain. The abstract and pipeline description present the method as a direct geometric solution without invoking prior author work to justify uniqueness or ansatz choices. This is a standard sensor-fusion approach whose validity rests on external calibration and motion assumptions rather than internal redefinition of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sensor calibration and temporal synchronization between camera, RADAR, and LiDAR are sufficiently accurate for the closed-form solution to hold
invented entities (1)
-
velocity cube
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Radarnet: Exploiting radar for robust perception of dynamic objects,
B. Yang, R. Guo, M. Liang, S. Casas, and R. Urtasun, “Radarnet: Exploiting radar for robust perception of dynamic objects,” 2020. [Online]. Available: https://arxiv.org/abs/2007.14366
-
[2]
Bi-lrfusion: Bi-directional lidar-radar fusion for 3d dynamic object detection,
Y . Wang, J. Deng, Y . Li, J. Hu, C. Liu, Y . Zhang, J. Ji, W. Ouyang, and Y . Zhang, “Bi-lrfusion: Bi-directional lidar-radar fusion for 3d dynamic object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13 394–13 403
work page 2023
-
[3]
Lirafusion: Deep adaptive lidar-radar fusion for 3d object detection,
J. Song, L. Zhao, and K. A. Skinner, “Lirafusion: Deep adaptive lidar-radar fusion for 3d object detection,” 2024. [Online]. Available: https://arxiv.org/abs/2402.11735
-
[4]
Self-supervised velocity estimation for automotive radar object detection networks,
D. Niederl ¨ohner, M. Ulrich, S. Braun, D. K ¨ohler, F. Faion, C. Gl ¨aser, A. Treptow, and H. Blume, “Self-supervised velocity estimation for automotive radar object detection networks,” in2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 352–359
work page 2022
-
[5]
Camera-based vehicle velocity estimation from monocular video
M. Kampelm ¨uhler, M. G. M ¨uller, and C. Feichtenhofer, “Camera-based vehicle velocity estimation from monocular video,” 2018. [Online]. Available: https://arxiv.org/abs/1802.07094
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Real time monocular vehicle velocity estimation using synthetic data,
R. McCraith, L. Neumann, and A. Vedaldi, “Real time monocular vehicle velocity estimation using synthetic data,” 2021. [Online]. Available: https://arxiv.org/abs/2109.07957
-
[7]
Full-velocity radar returns by radar-camera fusion,
Y . Long, D. Morris, X. Liu, M. Castro, P. Chakravarty, and S. Narayanan, “Full-velocity radar returns by radar-camera fusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 605–15 614
work page 2021
-
[8]
Radar cfar thresholding in clutter and multiple target situations,
H. Rohling, “Radar cfar thresholding in clutter and multiple target situations,”IEEE transactions on aerospace and electronic systems, no. 4, pp. 608–621, 2007
work page 2007
-
[9]
Towards deep radar perception for autonomous driving: Datasets, methods, and challenges,
B. Zhou, S. Sun, Y . Zhang, M. Yu, and Y . Wang, “Towards deep radar perception for autonomous driving: Datasets, methods, and challenges,” Sensors, vol. 22, no. 11, p. 4208, 2022
work page 2022
-
[10]
A survey of deep learning- based methods for fmcw radar odometry and ego-localization,
L. Brune, K. Dietmayer, and N. Scheiner, “A survey of deep learning- based methods for fmcw radar odometry and ego-localization,”Applied Sciences, vol. 14, no. 6, p. 2267, 2024
work page 2024
-
[11]
Dense scene flow esti- mation from sparse lidar and stereo images,
R. Battrawy, Z. Li, B. Dellen, and N. Navab, “Dense scene flow esti- mation from sparse lidar and stereo images,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 7604–7610
work page 2019
-
[12]
Uncertainty estimation of dense optical flow for robust scene flow in dynamic environments,
C. M. Ng, Y . Chen, X. Jiang, A. Steed, and W. K. V . Lo, “Uncertainty estimation of dense optical flow for robust scene flow in dynamic environments,”Sensors, vol. 21, no. 21, p. 7212, 2021
work page 2021
-
[13]
Dogflow: Self-supervised lidar scene flow via cross-modal doppler guidance,
A. Khoche, Q. Zhang, Y . Cai, S. S. Mansouri, and P. Jensfelt, “Dogflow: Self-supervised lidar scene flow via cross-modal doppler guidance,” arXiv preprint arXiv:2508.18506, 2025
-
[14]
Full-velocity radar returns by radar-camera fusion,
Y . Long, D. Morris, X. Liu, M. Castro, P. Chakravarty, and P. Narayanan, “Full-velocity radar returns by radar-camera fusion,”
-
[15]
Available: https://arxiv.org/abs/2108.10637
[Online]. Available: https://arxiv.org/abs/2108.10637
-
[16]
Icp-flow: Lidar scene flow estimation with icp,
Y . Lin and H. Caesar, “Icp-flow: Lidar scene flow estimation with icp,”
-
[17]
Available: https://arxiv.org/abs/2402.17351
[Online]. Available: https://arxiv.org/abs/2402.17351
-
[18]
Seflow: A self-supervised scene flow method in autonomous driving,
Q. Zhang, Y . Yang, P. Li, O. Andersson, and P. Jensfelt, “Seflow: A self-supervised scene flow method in autonomous driving,” 2024. [Online]. Available: https://arxiv.org/abs/2407.01702
-
[19]
X. Li, J. Zheng, F. Ferroni, J. K. Pontes, and S. Lucey, “Fast neural scene flow,” 2023. [Online]. Available: https://arxiv.org/abs/2304.09121
-
[20]
Raft: Recurrent all-pairs field transforms for optical flow,
Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” 2020. [Online]. Available: https://arxiv.org/abs/2003.12039
-
[21]
Sea-raft: Simple, efficient, accurate raft for optical flow,
Y . Wang, L. Lipson, and J. Deng, “Sea-raft: Simple, efficient, accurate raft for optical flow,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.14793
-
[22]
High resolution multi-scale raft (robust vision challenge 2022),
A. Jahedi, M. Luz, L. Mehl, M. Rivinius, and A. Bruhn, “High resolution multi-scale raft (robust vision challenge 2022),” 2022. [Online]. Available: https://arxiv.org/abs/2210.16900
-
[23]
Raft-3d: Scene flow using rigid-motion embeddings,
Z. Teed and J. Deng, “Raft-3d: Scene flow using rigid-motion embeddings,” 2021. [Online]. Available: https://arxiv.org/abs/2012.007 26
work page 2021
-
[24]
Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,
Z. Zhang, H. Jiang, and H. Singh, “Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 5048–5055
work page 2024
-
[25]
Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,
——, “Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 5048–5055
work page 2024
-
[26]
Xfeat: Accelerated features for lightweight image matching,
G. Potje, F. Cadar, A. Araujo, R. Martins, and E. R. Nascimento, “Xfeat: Accelerated features for lightweight image matching,” 2024. [Online]. Available: https://arxiv.org/abs/2404.19174
-
[27]
An iterative image registration technique with an application to stereo vision,
B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inProceedings of the 7th Inter- national Joint Conference on Artificial Intelligence - Volume 2, ser. IJCAI’81. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1981, p. 674–679
work page 1981
-
[28]
M. A. Richards, J. A. Scheer, and W. A. Holm,Principles of modern radar: basic principles. IET, 2010
work page 2010
-
[29]
Cnn based road user detection using the 3d radar cube,
A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “Cnn based road user detection using the 3d radar cube,”IEEE Robotics and Automation Letters, vol. 5, no. 2, p. 1263–1270, Apr. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.2967272
-
[30]
Radar and lidar deep fusion: Pro- viding doppler contexts to time-of-flight lidar,
Y . Jin, Y . Kuang, M. Hoffmann, C. Sch ¨ußler, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Radar and lidar deep fusion: Pro- viding doppler contexts to time-of-flight lidar,”IEEE Sensors Journal, vol. 23, no. 20, pp. 25 587–25 600, 2023
work page 2023
-
[31]
Dpft: Dual perspective fusion transformer for camera-radar-based object detection,
F. Fent, A. Palffy, and H. Caesar, “Dpft: Dual perspective fusion transformer for camera-radar-based object detection,”IEEE Transactions on Intelligent Vehicles, p. 1–11, 2024. [Online]. Available: http://dx.doi.org/10.1109/TIV.2024.3507538
-
[32]
milliflow: Scene flow estimation on mmwave radar point cloud for human motion sensing,
F. Ding, Z. Luo, P. Zhao, and C. X. Lu, “milliflow: Scene flow estimation on mmwave radar point cloud for human motion sensing,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 202–221
work page 2024
-
[33]
Learning optical flow and scene flow with bidirectional camera-lidar fusion,
H. Liu, T. Lu, Y . Xu, J. Liu, and L. Wang, “Learning optical flow and scene flow with bidirectional camera-lidar fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 4, pp. 2378– 2395, 2023
work page 2023
-
[34]
Exploring radar data representations in autonomous driving: A comprehensive review,
S. Yao, R. Guan, Z. Peng, C. Xu, Y . Shi, W. Ding, E. Gee Lim, Y . Yue, H. Seo, K. Lok Man, J. Ma, X. Zhu, and Y . Yue, “Exploring radar data representations in autonomous driving: A comprehensive review,”IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 6, p. 7401–7425, Jun. 2025. [Online]. Available: http://dx.doi.org/10.1109/TITS.20...
-
[35]
Windowing design and performance assessment for mitigation of spectrum leakage,
Jwo, Dah-Jing, Wu, I-Hua, and Chang, Yi, “Windowing design and performance assessment for mitigation of spectrum leakage,” E3S Web Conf., vol. 94, p. 03001, 2019. [Online]. Available: https://doi.org/10.1051/e3sconf/20199403001
-
[36]
1 Year, 1000km: The Oxford RobotCar Dataset,
W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,”The International Journal of Robotics Research (IJRR), vol. 36, no. 1, pp. 3–15, 2017. [Online]. Available: http://dx.doi.org/10.1177/0278364916679498
-
[37]
The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset,
D. Barnes, M. Gadd, P. Murcutt, P. Newman, and I. Posner, “The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, 2020. [Online]. Available: https://arxiv.org/abs/1909.01300
-
[38]
Multi- class road user detection with 3+1d radar in the view-of-delft dataset,
A. Palffy, E. Pool, S. Baratam, J. F. P. Kooij, and D. M. Gavrila, “Multi- class road user detection with 3+1d radar in the view-of-delft dataset,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4961–4968, 2022
work page 2022
-
[39]
Automotive radar dataset for deep learning based 3d object detection,
M. Meyer and G. Kuschk, “Automotive radar dataset for deep learning based 3d object detection,” in2019 16th European Radar Conference (EuRAD), 2019, pp. 129–132
work page 2019
-
[40]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Kr- ishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inCVPR, 2020. IEEE ROBOTICS AND AUTOMATION PRACTICE (RA-P) 6 TABLE II SENSOR SETTINGS AND SPECIFICATIONS USED IN OUR DATASET. V-MD3 RADAR HESAI QT128 LiDAR ZED 2i Camera Setting used...
work page 2020
-
[41]
These are shown as velocity plots over time for the person and the retro-reflector individually. https://youtu.be/M MEd8KCGGbM Video 4:Video visualizing the effects of different windowing techniques on range-doppler RADAR cube data (of which the Hanning window was selected). This is compared to the RFFT (range FFT) output of the V-DM3 RADAR, which has its...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.