pith. sign in

arxiv: 2605.07885 · v1 · submitted 2026-05-08 · 💻 cs.RO

AERO-VIS: Asynchronous Event-based Real-time Onboard Visual-Inertial SLAM

Pith reviewed 2026-05-11 03:36 UTC · model grok-4.3

classification 💻 cs.RO
keywords event camerasvisual-inertial SLAMUAV navigationasynchronous processingonboard real-time SLAMkeypoint detectionstereo event vision
0
0 comments X

The pith

AERO-VIS is the first purely event-based inertial SLAM system to achieve closed-loop UAV control and large-scale estimation using only onboard compute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AERO-VIS, a stereo event-inertial SLAM pipeline that processes incoming events asynchronously rather than at fixed intervals. It integrates a learned keypoint detector that adapts to available compute, keeping latency low enough for real-time operation. The authors deploy the system on a UAV and report that it maintains tracking accuracy sufficient for autonomous flight control and large-scale mapping. If correct, this removes the need for traditional frame-based cameras or external processors in environments where motion blur or extreme lighting normally defeat visual navigation.

Core claim

We present AERO-VIS, a stereo event-inertial SLAM system with an integrated, data-driven, robust, and performance-optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low-latency and real-time performance. When deploying AERO-VIS on a UAV, we achieve unprecedented accuracy in onboard event-based SLAM. These unique characteristics enable us to present the first purely event-based inertial SLAM system that demonstrates closed-loop UAV control and large-scale state estimation while relying solely on onboard compute.

What carries the argument

asynchronous event-stream processing combined with an integrated data-driven keypoint detector in a stereo event-inertial SLAM pipeline

If this is right

  • The system runs in real time on onboard hardware without external GPUs or servers.
  • Event cameras provide robustness to motion blur and high dynamic range during fast UAV maneuvers.
  • Asynchronous adaptation maintains low latency even when compute load varies.
  • Large-scale state estimation becomes possible using only events and inertial data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar asynchronous pipelines could be tested on other high-speed platforms such as ground robots in low-light tunnels.
  • The learned detector might be swapped for alternative event-based feature extractors to measure the contribution of the data-driven component.
  • Extending the stereo setup to more cameras could improve scale estimation without adding synchronized frame processing.

Load-bearing premise

The asynchronous processing and data-driven detector keep tracking accurate and free of latency spikes or drift that would break closed-loop UAV control under all tested and untested flight conditions.

What would settle it

A controlled UAV flight in which event-based tracking drifts or latency spikes appear and the vehicle loses autonomous control would falsify the claim of reliable closed-loop operation.

Figures

Figures reproduced from arXiv: 2605.07885 by Leonard Frei{\ss}muth, Sebasti\'an Barbas Laina, Simon Boche, Stefan Leutenegger, Yannick Burkhardt.

Figure 1
Figure 1. Figure 1: Estimated trajectories by AERO-VIS when employed [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Stereo event and IMU data are preprocessed through time synchronization and high-frequency MCTS calculation [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Two samples of the boxes 6dof-sequence of the ECD dataset: we visualize one channel-pair with positive (red) and negative (blue) polarity for MCTS∆t (fixed time windows) and MCTSNe (fixed event counts), as well as the matches predicted by SuperEvent [1], SuperEvent+ and SuperLitE. Green matches have a reprojection error of less than 5 pixels using ground truth pose measurements, while yellow matches are ou… view at source ↗
Figure 4
Figure 4. Figure 4: Estimated trajectory during a 2 km (20 min) urban [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

The robustness of event cameras to high dynamic range and motion blur holds the potential to improve visual odometry systems in challenging environments. Although their high temporal resolution does not require synchronous processing, most event-based odometry methods still run at fixed rates, which simplifies system design but restricts latency and throughput. In this work, we present AERO-VIS, a stereo event-inertial SLAM system with an integrated, data-driven, robust, and performance-optimized keypoint detector. By processing the event stream asynchronously, the system dynamically adapts to downstream runtime demands, ensuring low-latency and real-time performance. When deploying AERO-VIS on a UAV, we achieve unprecedented accuracy in onboard event-based SLAM. These unique characteristics enable us to present the first purely event-based inertial SLAM system that demonstrates closed-loop UAV control and large-scale state estimation while relying solely on onboard compute. A video of the experiments and the source code are available at ethz-mrl.github.io/AERO-VIS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents AERO-VIS, a stereo event-inertial SLAM system that integrates a data-driven keypoint detector and processes the event stream asynchronously to dynamically adapt to runtime demands. It claims to deliver low-latency real-time performance on UAV onboard compute, achieving unprecedented accuracy in large-scale state estimation and enabling the first demonstration of closed-loop UAV control using a purely event-based inertial SLAM pipeline. Source code and experiment videos are released.

Significance. If the experimental validation holds, the work is significant for event-based robotics as it demonstrates practical closed-loop control and large-scale onboard SLAM in high-dynamic-range and motion-blur conditions where frame-based cameras typically fail. The explicit release of source code is a clear strength that supports reproducibility and community extension. The asynchronous design addresses a key limitation of prior fixed-rate event methods.

major comments (1)
  1. [§5] §5 (Experimental Results): The central claim of enabling stable closed-loop UAV control and large-scale estimation rests on the asynchronous pipeline and learned detector maintaining bounded latency and low drift. However, the results do not include worst-case latency histograms, event-rate vs. latency curves, or drift-vs-speed analysis under high-dynamic regimes, leaving the robustness assumption unverified for untested flight conditions.
minor comments (1)
  1. [Abstract] The abstract asserts 'unprecedented accuracy' without inline quantitative values; moving at least one key metric (e.g., ATE or RPE) into the abstract would strengthen the summary.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and positive recommendation for minor revision. We are pleased that the significance of the asynchronous design and the release of source code are recognized. Below we address the major comment regarding the experimental results.

read point-by-point responses
  1. Referee: §5 (Experimental Results): The central claim of enabling stable closed-loop UAV control and large-scale estimation rests on the asynchronous pipeline and learned detector maintaining bounded latency and low drift. However, the results do not include worst-case latency histograms, event-rate vs. latency curves, or drift-vs-speed analysis under high-dynamic regimes, leaving the robustness assumption unverified for untested flight conditions.

    Authors: We agree that providing more detailed analysis on latency and drift would strengthen the validation of our claims. Our current experiments include multiple UAV flights in challenging conditions with varying event rates and dynamics, demonstrating real-time performance and low drift. To address this, we will include in the revised manuscript worst-case latency histograms, event-rate versus latency curves, and drift versus speed plots derived from the logged data of our high-dynamic flights. This will explicitly show the bounded latency and robustness under the tested regimes. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering system with external validation

full rationale

The paper describes an integrated SLAM system (asynchronous event processing, data-driven detector, stereo event-inertial fusion, UAV closed-loop control) whose claims rest on implementation details, runtime measurements, and hardware experiments rather than any derivation chain. No equations, fitted parameters, or first-principles results are presented that reduce to their own inputs by construction. The abstract and system description contain no self-definitional loops, no 'prediction' of quantities already used in fitting, and no load-bearing uniqueness theorems imported from prior self-citations. External code release and experimental results provide independent falsifiability, keeping the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The system rests on standard event-camera and IMU sensor models from prior literature without introducing new physical entities or fitted constants that define the central result.

axioms (2)
  • domain assumption Event cameras output asynchronous brightness-change events that are robust to motion blur and high dynamic range.
    Invoked in the abstract as the basis for improved robustness in challenging environments.
  • standard math Inertial measurements can be integrated with visual features for state estimation in SLAM.
    Standard visual-inertial odometry assumption used throughout the system description.

pith-pipeline@v0.9.0 · 5491 in / 1299 out tokens · 49441 ms · 2026-05-11T03:36:13.871357+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Superevent: Cross- modal learning of event-based keypoint detection for slam,

    Y . Burkhardt, S. Schaefer, and S. Leutenegger, “Superevent: Cross- modal learning of event-based keypoint detection for slam,” inPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  2. [2]

    ESVO2: Direct visual-inertial odometry with stereo event cameras,

    J. Niu, S. Zhong, X. Lu, S. Shen, G. Gallego, and Y . Zhou, “ESVO2: Direct visual-inertial odometry with stereo event cameras,”IEEE Transactions on Robotics, 2025

  3. [3]

    Deep visual odometry for stereo event cameras,

    S. Zhong, J. Niu, and Y . Zhou, “Deep visual odometry for stereo event cameras,”IEEE Robotics and Automation Letters, 2025

  4. [4]

    ESVIO: Event-based stereo visual inertial odometry,

    P. Chen, W. Guan, and P. Lu, “ESVIO: Event-based stereo visual inertial odometry,”IEEE Robotics and Automation Letters, 2023

  5. [5]

    Indoor UA V navigation using event cameras and intermediate frame reconstruction,

    D. Tejero-Ruiz, D. Sol ´ıs-Mart´ın, F. J. P´erez-Grau, and J. Borrego-D´ıaz, “Indoor UA V navigation using event cameras and intermediate frame reconstruction,”Computer Vision and Image Understanding, 2026

  6. [6]

    OKVIS2: Realtime scalable visual-inertial SLAM with loop closure,

    S. Leutenegger, “OKVIS2: Realtime scalable visual-inertial SLAM with loop closure,”arXiv preprint, 2022

  7. [7]

    Event-based visual inertial odometry,

    A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based visual inertial odometry,” in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  8. [8]

    Feature-based event stereo visual odometry,

    A. Hadviger, I. Cvi ˇsi´c, I. Markovi ´c, S. Vra ˇzi´c, and I. Petrovi ´c, “Feature-based event stereo visual odometry,” in2021 European Conference on Mobile Robots (ECMR), 2021

  9. [9]

    HOTS: A hierarchy of event-based time-surfaces for pattern recognition,

    X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benos- man, “HOTS: A hierarchy of event-based time-surfaces for pattern recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

  10. [10]

    Asynchronous corner detection and track- ing for event cameras in real time,

    I. Alzugaray and M. Chli, “Asynchronous corner detection and track- ing for event cameras in real time,”IEEE Robotics and Automation Letters, 2018

  11. [11]

    Event-based line slam in real-time,

    W. Chamorro, J. Sola, and J. Andrade-Cetto, “Event-based line slam in real-time,”IEEE Robotics and Automation Letters, 2022

  12. [12]

    EMVS: Event-based multi-view stereo—3D reconstruction with an event cam- era in real-time,

    H. Rebecq, G. Gallego, E. Mueggler, and D. Scaramuzza, “EMVS: Event-based multi-view stereo—3D reconstruction with an event cam- era in real-time,”International Journal of Computer Vision, 2018

  13. [13]

    Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization,

    W. Guan and P. Lu, “Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022

  14. [14]

    An iterative image registration technique with an application to stereo vision,

    B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inIJCAI’81: 7th international joint conference on Artificial intelligence, 1981

  15. [15]

    Real-time 3D reconstruc- tion and 6-dof tracking with an event camera,

    H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3D reconstruc- tion and 6-dof tracking with an event camera,” inEuropean conference on computer vision, 2016

  16. [16]

    EVO: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,

    H. Rebecq, T. Horstsch ¨afer, G. Gallego, and D. Scaramuzza, “EVO: A geometric approach to event-based 6-dof parallel tracking and mapping in real time,”IEEE Robotics and Automation Letters, 2016

  17. [17]

    ES-PTAM: Event-based stereo parallel tracking and mapping,

    S. Ghosh, V . Cavinato, and G. Gallego, “ES-PTAM: Event-based stereo parallel tracking and mapping,” inEuropean Conference on Computer Vision (ECCV) Workshops, 2024

  18. [18]

    Multi-event-camera depth estimation and outlier rejection by refocused events fusion,

    S. Ghosh and G. Gallego, “Multi-event-camera depth estimation and outlier rejection by refocused events fusion,”Advanced Intelligent Systems, 2022

  19. [19]

    Event-based stereo visual odom- etry,

    Y . Zhou, G. Gallego, and S. Shen, “Event-based stereo visual odom- etry,”IEEE Transactions on Robotics, 2021

  20. [20]

    T-ESVO: improved event-based stereo visual odometry via adaptive time-surface and truncated signed distance function,

    Z. Liu, D. Shi, R. Li, Y . Zhang, and S. Yang, “T-ESVO: improved event-based stereo visual odometry via adaptive time-surface and truncated signed distance function,”Advanced Intelligent Systems, 2023

  21. [21]

    ESVIO: event-based stereo visual- inertial odometry,

    Z. Liu, D. Shi, R. Li, and S. Yang, “ESVIO: event-based stereo visual- inertial odometry,”Sensors, 2023

  22. [22]

    IMU-aided event-based stereo visual odometry,

    J. Niu, S. Zhong, and Y . Zhou, “IMU-aided event-based stereo visual odometry,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024

  23. [23]

    Unsupervised learning of dense optical flow, depth and egomotion from sparse event data,

    C. Ye, A. Mitrokhin, C. Ferm ¨uller, J. A. Yorke, and Y . Aloimonos, “Unsupervised learning of dense optical flow, depth and egomotion from sparse event data,”arXiv preprint, 2018

  24. [24]

    Unsupervised event-based learning of optical flow, depth, and egomotion,

    A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019

  25. [25]

    Deep event visual odometry,

    S. Klenk, M. Motzet, L. Koestler, and D. Cremers, “Deep event visual odometry,” in2024 International Conference on 3D Vision (3DV), 2024

  26. [26]

    Deep patch visual odometry,

    Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” Advances in Neural Information Processing Systems, 2023

  27. [27]

    DEIO: Deep event inertial odometry,

    W. Guan, F. Lin, P. Chen, and P. Lu, “DEIO: Deep event inertial odometry,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025

  28. [28]

    Fast image reconstruction with an event camera,

    C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony, and D. Scaramuzza, “Fast image reconstruction with an event camera,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020

  29. [29]

    OpenVINS: A research platform for visual-inertial estimation,

    P. Geneva, K. Eckenhoff, W. Lee, Y . Yang, and G. Huang, “OpenVINS: A research platform for visual-inertial estimation,” in2020 IEEE International Conference on Robotics and Automation (ICRA), 2020

  30. [30]

    SuperPoint: Self- supervised interest point detection and description,

    D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self- supervised interest point detection and description,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018

  31. [31]

    MaxViT: Multi-axis vision transformer,

    Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and Y . Li, “MaxViT: Multi-axis vision transformer,” inEuropean conference on computer vision, 2022

  32. [32]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint, 2014

  33. [33]

    Searching for MobileNetV3,

    A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, Q. V . Le, and H. Adam, “Searching for MobileNetV3,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2019

  34. [34]

    Bags of binary words for fast place recognition in image sequences,

    D. G ´alvez-L´opez and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,”IEEE Transactions on robotics, 2012

  35. [35]

    The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam,

    E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza, “The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and slam,”The International Journal of Robotics Research, 2017

  36. [36]

    Event-aided direct sparse odometry,

    J. Hidalgo-Carri ´o, G. Gallego, and D. Scaramuzza, “Event-aided direct sparse odometry,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

  37. [37]

    Semi-dense 3D reconstruction with a stereo event camera,

    Y . Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, and D. Scaramuzza, “Semi-dense 3D reconstruction with a stereo event camera,” inPro- ceedings of the European Conference on Computer Vision, 2018

  38. [38]

    TUM-VIE: The TUM stereo visual-inertial event dataset,

    S. Klenk, J. Chui, N. Demmel, and D. Cremers, “TUM-VIE: The TUM stereo visual-inertial event dataset,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

  39. [39]

    VECtor: A versatile event-centric benchmark for multi- sensor SLAM,

    L. Gao, Y . Liang, J. Yang, S. Wu, C. Wang, J. Chen, and L. Kneip, “VECtor: A versatile event-centric benchmark for multi- sensor SLAM,”IEEE Robotics and Automation Letters, 2022

  40. [40]

    Fully autonomous micro air vehicle flight and land- ing on a moving target using visual–inertial estimation and model- predictive control,

    D. Tzoumanikas, W. Li, M. Grimm, K. Zhang, M. Kovac, and S. Leutenegger, “Fully autonomous micro air vehicle flight and land- ing on a moving target using visual–inertial estimation and model- predictive control,”Journal of Field Robotics, 2019