pith. sign in

arxiv: 2511.01383 · v2 · submitted 2025-11-03 · 💻 cs.RO

CaRLi-V: Camera-RADAR-LiDAR Point-Wise 3D Velocity Estimation

Pith reviewed 2026-05-18 01:26 UTC · model grok-4.3

classification 💻 cs.RO
keywords 3D velocity estimationsensor fusionRADARLiDARcameraoptical flowroboticsscene flow
0
0 comments X

The pith

CaRLi-V fuses RADAR velocity cube, camera optical flow, and LiDAR ranges in a closed-form solution to estimate 3D velocities at dense points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CaRLi-V, a pipeline that estimates 3D velocity for many individual points in a scene by combining three sensors. Raw RADAR data is turned into a velocity cube to supply radial velocity components, optical flow from the camera supplies the tangential components, and LiDAR supplies precise ranges. These inputs are merged through a closed-form mathematical solution to recover full velocity vectors. The resulting estimates show low error against ground truth and outperform existing scene flow techniques, which matters for robots that must plan paths and avoid collisions around moving people or objects.

Core claim

By leveraging raw RADAR measurements to create a novel RADAR representation, the velocity cube, which densely encodes RADAR radial velocities, and combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution, the approach can produce 3D velocity estimates for a dense array of points and outperforms state-of-the-art scene flow methods.

What carries the argument

The velocity cube, a dense encoding of RADAR radial velocities, which supplies one velocity component that optical flow and LiDAR ranges complete via closed-form solution to recover full 3D vectors.

If this is right

  • Produces 3D velocity estimates for a dense array of points in the scene.
  • Achieves low velocity error relative to ground truth on the authors' custom dataset.
  • Outperforms state-of-the-art scene flow methods.
  • Supports improved path planning, collision avoidance, and object manipulation around dynamic agents.
  • Ships as an open-source ROS2 package ready for field use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sensor combination could support velocity-aware mapping for longer-term robot navigation.
  • Extending the closed-form step to include the robot's own motion would allow use on moving platforms without extra compensation layers.
  • The dense point velocities could feed directly into existing multi-object trackers to improve association over time.
  • Public benchmark tests would clarify how the method compares when ground truth comes from different sources.

Load-bearing premise

The closed-form solution can recover accurate full 3D velocity vectors from the three sensor inputs without major degradation from calibration errors, synchronization problems, or non-rigid object motion.

What would settle it

Velocity estimates that show large errors against ground truth when the sensors are deliberately desynchronized by tens of milliseconds or when the scene contains visibly deforming objects would show the closed-form recovery does not hold.

Figures

Figures reproduced from arXiv: 2511.01383 by Andres M. Diaz Aguilar, Cesar Cadena, Landson Guo, Marco Hutter, Turcan Tuna, William Talbot.

Figure 1
Figure 1. Figure 1: Resulting point clouds augmented with full velocity vectors displayed [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The CaRLi-V pipeline is divided into three steps: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effect of thresholding on the velocity cube. Thresholding removes salt [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Plots of the magnitude of the ground truth and estimated velocity vectors, as well as their decompositions into radial and tangential components. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
read the original abstract

Accurate point-wise velocity estimation in 3D is crucial for robot interaction with non-rigid dynamic agents, enabling robust performance in path planning, collision avoidance, and object manipulation in dynamic environments. To this end, this paper proposes a novel RADAR, LiDAR, and camera fusion pipeline for point-wise 3D velocity estimation named CaRLi-V. This pipeline leverages raw RADAR measurements to create a novel RADAR representation, the velocity cube, which densely encodes RADAR radial velocities. By combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution, our approach can produce 3D velocity estimates for a dense array of points. Developed as an open-source ROS2 package, CaRLi-V has been field-tested on a custom dataset and achieves low velocity error metrics relative to ground truth while outperforming state-of-the-art scene flow methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes CaRLi-V, a RADAR-camera-LiDAR fusion pipeline for point-wise 3D velocity estimation. It introduces a 'velocity cube' representation derived from raw RADAR measurements to densely encode radial velocities, combines this with optical flow from the camera for tangential velocity components and LiDAR point-wise ranges, and solves for full 3D velocity vectors via a closed-form geometric solution. The method is released as an open-source ROS2 package, field-tested on a custom dataset, and claims low velocity error relative to ground truth while outperforming state-of-the-art scene flow methods.

Significance. If the closed-form fusion proves robust under realistic conditions, the work offers a lightweight, interpretable, parameter-free alternative to learning-based scene flow estimators for dense 3D velocity in dynamic robotics scenarios. Strengths include the open-source ROS2 implementation, real-world field testing, and explicit use of a closed-form solution rather than learned models.

major comments (2)
  1. [Abstract / pipeline description] Abstract and pipeline description: the central claim that the closed-form solution recovers accurate 3D velocities from RADAR radial components (via the velocity cube), camera tangential components, and LiDAR ranges is load-bearing for the outperformance result, yet the manuscript provides no quantitative sensitivity analysis or ablation on sensor calibration errors, synchronization offsets, or deviations from the instantaneous rigid-motion assumption. These factors are required for algebraic exactness of the decomposition.
  2. [Abstract] Abstract: the assertion of 'low velocity error metrics' and outperformance versus scene flow baselines is stated without any numerical values, error bars, dataset size, ground-truth acquisition details, or ablation tables, preventing verification of the soundness of the reported results.
minor comments (1)
  1. [Abstract] The term 'velocity cube' is introduced without a formal definition or equation showing how raw RADAR returns are binned or interpolated into the dense representation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper to improve clarity and strengthen the supporting evidence for our claims.

read point-by-point responses
  1. Referee: [Abstract / pipeline description] Abstract and pipeline description: the central claim that the closed-form solution recovers accurate 3D velocities from RADAR radial components (via the velocity cube), camera tangential components, and LiDAR ranges is load-bearing for the outperformance result, yet the manuscript provides no quantitative sensitivity analysis or ablation on sensor calibration errors, synchronization offsets, or deviations from the instantaneous rigid-motion assumption. These factors are required for algebraic exactness of the decomposition.

    Authors: We agree that explicit sensitivity analysis would better substantiate the robustness of the closed-form geometric solution. The derivation relies on accurate extrinsic calibration, temporal synchronization, and the approximation of constant velocity over the brief interval between sensor measurements. While our field tests on the custom dataset demonstrate practical performance, the manuscript indeed lacks a dedicated quantitative study of these factors. In the revised version we will add a new subsection (or appendix) that reports the effects of realistic calibration perturbations (e.g., 0.5–2° extrinsic errors), synchronization offsets (10–100 ms), and controlled deviations from the constant-velocity assumption, using both synthetic perturbations and re-processing of the recorded sequences. This addition will directly address the algebraic-exactness concern. revision: yes

  2. Referee: [Abstract] Abstract: the assertion of 'low velocity error metrics' and outperformance versus scene flow baselines is stated without any numerical values, error bars, dataset size, ground-truth acquisition details, or ablation tables, preventing verification of the soundness of the reported results.

    Authors: We concur that the abstract should contain concrete numerical results to enable readers to evaluate the claims immediately. The body of the manuscript already reports specific velocity error metrics, standard deviations, dataset size (number of frames and sequences), ground-truth acquisition method, and comparison tables against scene-flow baselines. We will revise the abstract to include the key quantitative figures and dataset details while respecting length constraints, thereby improving verifiability without altering the technical content. revision: yes

Circularity Check

0 steps flagged

No circularity: closed-form geometric fusion from independent sensor inputs

full rationale

The paper's central derivation is a closed-form algebraic combination of three independent sensor-derived quantities (RADAR radial velocities via the velocity cube, camera optical flow for tangential components, and LiDAR point ranges) to recover 3D velocity vectors. No step reduces to a fitted parameter renamed as a prediction, a self-definition, or a load-bearing self-citation chain. The abstract and pipeline description present the method as a direct geometric solution without invoking prior author work to justify uniqueness or ansatz choices. This is a standard sensor-fusion approach whose validity rests on external calibration and motion assumptions rather than internal redefinition of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on accurate cross-sensor calibration and synchronization plus the validity of the closed-form geometric reconstruction; the velocity cube is a newly introduced representation whose independent validation rests on the reported experiments.

axioms (1)
  • domain assumption Sensor calibration and temporal synchronization between camera, RADAR, and LiDAR are sufficiently accurate for the closed-form solution to hold
    Invoked when the pipeline combines radial velocities, tangential flow, and ranges into 3D vectors.
invented entities (1)
  • velocity cube no independent evidence
    purpose: Dense encoding of RADAR radial velocities from raw measurements
    New data structure introduced to enable radial velocity extraction before fusion.

pith-pipeline@v0.9.0 · 5710 in / 1376 out tokens · 33202 ms · 2026-05-18T01:26:13.376387+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    Radarnet: Exploiting radar for robust perception of dynamic objects,

    B. Yang, R. Guo, M. Liang, S. Casas, and R. Urtasun, “Radarnet: Exploiting radar for robust perception of dynamic objects,” 2020. [Online]. Available: https://arxiv.org/abs/2007.14366

  2. [2]

    Bi-lrfusion: Bi-directional lidar-radar fusion for 3d dynamic object detection,

    Y . Wang, J. Deng, Y . Li, J. Hu, C. Liu, Y . Zhang, J. Ji, W. Ouyang, and Y . Zhang, “Bi-lrfusion: Bi-directional lidar-radar fusion for 3d dynamic object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13 394–13 403

  3. [3]

    Lirafusion: Deep adaptive lidar-radar fusion for 3d object detection,

    J. Song, L. Zhao, and K. A. Skinner, “Lirafusion: Deep adaptive lidar-radar fusion for 3d object detection,” 2024. [Online]. Available: https://arxiv.org/abs/2402.11735

  4. [4]

    Self-supervised velocity estimation for automotive radar object detection networks,

    D. Niederl ¨ohner, M. Ulrich, S. Braun, D. K ¨ohler, F. Faion, C. Gl ¨aser, A. Treptow, and H. Blume, “Self-supervised velocity estimation for automotive radar object detection networks,” in2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 352–359

  5. [5]

    Camera-based vehicle velocity estimation from monocular video

    M. Kampelm ¨uhler, M. G. M ¨uller, and C. Feichtenhofer, “Camera-based vehicle velocity estimation from monocular video,” 2018. [Online]. Available: https://arxiv.org/abs/1802.07094

  6. [6]

    Real time monocular vehicle velocity estimation using synthetic data,

    R. McCraith, L. Neumann, and A. Vedaldi, “Real time monocular vehicle velocity estimation using synthetic data,” 2021. [Online]. Available: https://arxiv.org/abs/2109.07957

  7. [7]

    Full-velocity radar returns by radar-camera fusion,

    Y . Long, D. Morris, X. Liu, M. Castro, P. Chakravarty, and S. Narayanan, “Full-velocity radar returns by radar-camera fusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 605–15 614

  8. [8]

    Radar cfar thresholding in clutter and multiple target situations,

    H. Rohling, “Radar cfar thresholding in clutter and multiple target situations,”IEEE transactions on aerospace and electronic systems, no. 4, pp. 608–621, 2007

  9. [9]

    Towards deep radar perception for autonomous driving: Datasets, methods, and challenges,

    B. Zhou, S. Sun, Y . Zhang, M. Yu, and Y . Wang, “Towards deep radar perception for autonomous driving: Datasets, methods, and challenges,” Sensors, vol. 22, no. 11, p. 4208, 2022

  10. [10]

    A survey of deep learning- based methods for fmcw radar odometry and ego-localization,

    L. Brune, K. Dietmayer, and N. Scheiner, “A survey of deep learning- based methods for fmcw radar odometry and ego-localization,”Applied Sciences, vol. 14, no. 6, p. 2267, 2024

  11. [11]

    Dense scene flow esti- mation from sparse lidar and stereo images,

    R. Battrawy, Z. Li, B. Dellen, and N. Navab, “Dense scene flow esti- mation from sparse lidar and stereo images,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 7604–7610

  12. [12]

    Uncertainty estimation of dense optical flow for robust scene flow in dynamic environments,

    C. M. Ng, Y . Chen, X. Jiang, A. Steed, and W. K. V . Lo, “Uncertainty estimation of dense optical flow for robust scene flow in dynamic environments,”Sensors, vol. 21, no. 21, p. 7212, 2021

  13. [13]

    Dogflow: Self-supervised lidar scene flow via cross-modal doppler guidance,

    A. Khoche, Q. Zhang, Y . Cai, S. S. Mansouri, and P. Jensfelt, “Dogflow: Self-supervised lidar scene flow via cross-modal doppler guidance,” arXiv preprint arXiv:2508.18506, 2025

  14. [14]

    Full-velocity radar returns by radar-camera fusion,

    Y . Long, D. Morris, X. Liu, M. Castro, P. Chakravarty, and P. Narayanan, “Full-velocity radar returns by radar-camera fusion,”

  15. [15]

    Available: https://arxiv.org/abs/2108.10637

    [Online]. Available: https://arxiv.org/abs/2108.10637

  16. [16]

    Icp-flow: Lidar scene flow estimation with icp,

    Y . Lin and H. Caesar, “Icp-flow: Lidar scene flow estimation with icp,”

  17. [17]

    Available: https://arxiv.org/abs/2402.17351

    [Online]. Available: https://arxiv.org/abs/2402.17351

  18. [18]

    Seflow: A self-supervised scene flow method in autonomous driving,

    Q. Zhang, Y . Yang, P. Li, O. Andersson, and P. Jensfelt, “Seflow: A self-supervised scene flow method in autonomous driving,” 2024. [Online]. Available: https://arxiv.org/abs/2407.01702

  19. [19]

    Fast neural scene flow,

    X. Li, J. Zheng, F. Ferroni, J. K. Pontes, and S. Lucey, “Fast neural scene flow,” 2023. [Online]. Available: https://arxiv.org/abs/2304.09121

  20. [20]

    Raft: Recurrent all-pairs field transforms for optical flow,

    Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” 2020. [Online]. Available: https://arxiv.org/abs/2003.12039

  21. [21]

    Sea-raft: Simple, efficient, accurate raft for optical flow,

    Y . Wang, L. Lipson, and J. Deng, “Sea-raft: Simple, efficient, accurate raft for optical flow,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.14793

  22. [22]

    High resolution multi-scale raft (robust vision challenge 2022),

    A. Jahedi, M. Luz, L. Mehl, M. Rivinius, and A. Bruhn, “High resolution multi-scale raft (robust vision challenge 2022),” 2022. [Online]. Available: https://arxiv.org/abs/2210.16900

  23. [23]

    Raft-3d: Scene flow using rigid-motion embeddings,

    Z. Teed and J. Deng, “Raft-3d: Scene flow using rigid-motion embeddings,” 2021. [Online]. Available: https://arxiv.org/abs/2012.007 26

  24. [24]

    Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,

    Z. Zhang, H. Jiang, and H. Singh, “Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 5048–5055

  25. [25]

    Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,

    ——, “Neuflow: Real-time, high-accuracy optical flow estimation on robots using edge devices,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 5048–5055

  26. [26]

    Xfeat: Accelerated features for lightweight image matching,

    G. Potje, F. Cadar, A. Araujo, R. Martins, and E. R. Nascimento, “Xfeat: Accelerated features for lightweight image matching,” 2024. [Online]. Available: https://arxiv.org/abs/2404.19174

  27. [27]

    An iterative image registration technique with an application to stereo vision,

    B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inProceedings of the 7th Inter- national Joint Conference on Artificial Intelligence - Volume 2, ser. IJCAI’81. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1981, p. 674–679

  28. [28]

    M. A. Richards, J. A. Scheer, and W. A. Holm,Principles of modern radar: basic principles. IET, 2010

  29. [29]

    Cnn based road user detection using the 3d radar cube,

    A. Palffy, J. Dong, J. F. P. Kooij, and D. M. Gavrila, “Cnn based road user detection using the 3d radar cube,”IEEE Robotics and Automation Letters, vol. 5, no. 2, p. 1263–1270, Apr. 2020. [Online]. Available: http://dx.doi.org/10.1109/LRA.2020.2967272

  30. [30]

    Radar and lidar deep fusion: Pro- viding doppler contexts to time-of-flight lidar,

    Y . Jin, Y . Kuang, M. Hoffmann, C. Sch ¨ußler, A. Deligiannis, J.-C. Fuentes-Michel, and M. V ossiek, “Radar and lidar deep fusion: Pro- viding doppler contexts to time-of-flight lidar,”IEEE Sensors Journal, vol. 23, no. 20, pp. 25 587–25 600, 2023

  31. [31]

    Dpft: Dual perspective fusion transformer for camera-radar-based object detection,

    F. Fent, A. Palffy, and H. Caesar, “Dpft: Dual perspective fusion transformer for camera-radar-based object detection,”IEEE Transactions on Intelligent Vehicles, p. 1–11, 2024. [Online]. Available: http://dx.doi.org/10.1109/TIV.2024.3507538

  32. [32]

    milliflow: Scene flow estimation on mmwave radar point cloud for human motion sensing,

    F. Ding, Z. Luo, P. Zhao, and C. X. Lu, “milliflow: Scene flow estimation on mmwave radar point cloud for human motion sensing,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 202–221

  33. [33]

    Learning optical flow and scene flow with bidirectional camera-lidar fusion,

    H. Liu, T. Lu, Y . Xu, J. Liu, and L. Wang, “Learning optical flow and scene flow with bidirectional camera-lidar fusion,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 4, pp. 2378– 2395, 2023

  34. [34]

    Exploring radar data representations in autonomous driving: A comprehensive review,

    S. Yao, R. Guan, Z. Peng, C. Xu, Y . Shi, W. Ding, E. Gee Lim, Y . Yue, H. Seo, K. Lok Man, J. Ma, X. Zhu, and Y . Yue, “Exploring radar data representations in autonomous driving: A comprehensive review,”IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 6, p. 7401–7425, Jun. 2025. [Online]. Available: http://dx.doi.org/10.1109/TITS.20...

  35. [35]

    Windowing design and performance assessment for mitigation of spectrum leakage,

    Jwo, Dah-Jing, Wu, I-Hua, and Chang, Yi, “Windowing design and performance assessment for mitigation of spectrum leakage,” E3S Web Conf., vol. 94, p. 03001, 2019. [Online]. Available: https://doi.org/10.1051/e3sconf/20199403001

  36. [36]

    1 Year, 1000km: The Oxford RobotCar Dataset,

    W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,”The International Journal of Robotics Research (IJRR), vol. 36, no. 1, pp. 3–15, 2017. [Online]. Available: http://dx.doi.org/10.1177/0278364916679498

  37. [37]

    The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset,

    D. Barnes, M. Gadd, P. Murcutt, P. Newman, and I. Posner, “The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, 2020. [Online]. Available: https://arxiv.org/abs/1909.01300

  38. [38]

    Multi- class road user detection with 3+1d radar in the view-of-delft dataset,

    A. Palffy, E. Pool, S. Baratam, J. F. P. Kooij, and D. M. Gavrila, “Multi- class road user detection with 3+1d radar in the view-of-delft dataset,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4961–4968, 2022

  39. [39]

    Automotive radar dataset for deep learning based 3d object detection,

    M. Meyer and G. Kuschk, “Automotive radar dataset for deep learning based 3d object detection,” in2019 16th European Radar Conference (EuRAD), 2019, pp. 129–132

  40. [40]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Kr- ishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inCVPR, 2020. IEEE ROBOTICS AND AUTOMATION PRACTICE (RA-P) 6 TABLE II SENSOR SETTINGS AND SPECIFICATIONS USED IN OUR DATASET. V-MD3 RADAR HESAI QT128 LiDAR ZED 2i Camera Setting used...

  41. [41]

    https://youtu.be/M MEd8KCGGbM Video 4:Video visualizing the effects of different windowing techniques on range-doppler RADAR cube data (of which the Hanning window was selected)

    These are shown as velocity plots over time for the person and the retro-reflector individually. https://youtu.be/M MEd8KCGGbM Video 4:Video visualizing the effects of different windowing techniques on range-doppler RADAR cube data (of which the Hanning window was selected). This is compared to the RFFT (range FFT) output of the V-DM3 RADAR, which has its...